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The greater good 


Governments, funding agencies and universities must all do their bit to ensure that research is 


appropriately assessed and rewarded. 


strange to any young scientist — and a good many older ones 

too. Think of the theoretical physicist who will happily sit 
down at the weekend, not to read the newspapers but to play around 
with equations. Think of the cell biologist willing to put up with the 
burden of running and re-running painstaking experiments because 
it’s so difficult to make the set-up deliver — but whose instincts suggest 
that there is a nugget of insight at the end. 

Now think of the head of an institute of, say, plant science. He or she 
will have no shortage of talented researchers wanting to understand 
fundamentals such as the precise mechanisms of influence of plant 
hormones at various stages of plant development. But what criteria will 
be used to assure the university or management board that the head 
has delivered? And what incentives will encourage the broadening 
of research in directions that scientific insight left to its own devices 
might not prioritize but that might, nevertheless, serve humanity well? 

Young scientists and lab heads alike live in a world in which the 
social contract for science is changing all around them. This is happen- 
ing in ways they can influence, if they are both lucky and astute, and 
from which they could and should benefit. Astute people, of course, 
often make their own good luck — finding themselves in the right 
place at the right time by being alert to the way the world is moving 
and engaging more broadly with interests around their disciplines 
than less adventurous academics might. 


D o researchers need extra incentives? The idea should seem 


CHALLENGING CRITERIA 

National governments — important drivers of the social contract — 
have the power to do more than steer the direction of science through 
broad funding priorities. They can, through their funding agencies, 
seek to ensure at least two other outcomes: an appropriate assessment 
and reward of research achievement, and an appropriate degree of 
trust in the robustness of that achievement. 

For the former, there is plenty of action, but plenty of debate too. 
The head of the plant institute will know that it will be all too easy for 
assessors to focus excessively on the number of papers published in 
high-impact journals. (Nature, proud of its own high- and low-cited 
papers, has long challenged that inclination.) It is ever more impor- 
tant in researcher assessments to recognize the work that focuses on 
key societal challenges. Examples across the spectrum could include 
exploring how established techniques can enhance plant resistance to 
disease; work on climate-change adaptation; studies to enhance access 
to fresh water; and the development of psychological treatments for 
post-traumatic stress. Such work may at times be scientifically incre- 
mental, but has every bit as much claim on recognition. 

Some major universities are recognizing the importance of such 
challenges by establishing their own programmes that may include the 
natural and social sciences and humanities (see page 7). For example, 
the National University of Singapore has a cluster of research groups 


working on the future of high-density cities; the University of Cam- 
bridge, UK, has several departments collaborating in public health; 
and Monash University in Melbourne, Australia, has established a 
programme on aspects of fresh water. 

Such collaborations are not easy to make effective, and it is there- 
fore doubly important that ‘the system’ finds new ways to recognize 
and reward their outputs. In this respect, an interesting case to follow 
in 2014 is the United Kingdom's Higher Education Funding Coun- 
cil for England (HEFCE) and its dependent 


“National institutions. Its national research assessments 
governments have evolved over the years in ways that 
have the power other agencies around the world have exam- 
to do more ined — although few, if any, have imitated 
than steer the the extreme extent to which the outcomes 
direction of directly influence subsequent funding. But 


this year, for the first time, we shall see what 
the HEFCE has made of the thousands of 
statements of research impacts submitted last year by universities. 

Researchers have often expressed alarm at the perceived tendency 
in such exercises to move from one extreme focus of assessment — 
journal impact factors — to another, in particular the anticipation of 
contributions to economic growth. This year will show whether the 
HEFCE can demonstrate the appropriate degree of breadth, nuance 
and critical assessment of such statements, while giving due recogni- 
tion to the socially valuable work advocated above. 

And what about research robustness — in other words, the trust that 
taxpayers can have that the appropriate standards of technical integrity, 
aka professionalism, are being followed in laboratories? Journals, not 
least Nature, are recognizing that they and their referees have a part to 
play in ensuring better standards of quantitative analysis, and of data and 
protocol transparency. Universities have a key role too — in ensuring a 
greater degree of researcher training and of lab stewardship in the quality 
of outputs than is often happening. That is a tough challenge, because 
vice-chancellors have so little power over how their academics behave, 
and those academics are all too often engaged in a rat race for funds. 

But 2014 should be a year in which funding agencies make clear 
their intentions-in promoting rigorous lab standards, and there should 
be a concomitant pressure on universities and institutes to demon- 
strate quality assurance of lab practices and culture. 

What is essential is that the motivation of young scientists to make 
a difference with their research is more broadly encouraged. They 
need strong mentoring and exemplars in doing a robust job and in 
contributing to the trust in their research community. Those who want 
to follow strong creative imaginations in discovering how the world 
works should be given full rein. So, too, should those more interested 
in using their creativity directly to make the world a better place. Those 
who have the luck to be able to do both and to be recognized for both 
will be in a sweet spot indeed. = 


science.” 
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much in the 15 years since I last visited, but there has been 

one innovation: all along the promenade, sturdy, open-air 
gym facilities invite locals and tourists alike to indulge in a little 
anaerobic exercise. 

This free equipment seems an obvious approach to improving 
public health. But tell that to inhabitants of the Bronx in New York or 
of London’s East End, where the most visible signs of health science 
are the nearby glass towers in Washington Heights or Whitechapel, 
in which biologists develop drugs that largely benefit the well-to-do. 

Rio’s outdoor gyms reflect the work of researchers such as 
epidemiologist Pedro Hallal at the Federal University of Pelotas in 
southern Brazil, who is part of an influential movement to better 
understand the links between mothers’ health, 
early-childhood exercise and lifelong health 
outcomes. This is the sort of societal research 
that the developing world needs as it expands its 
scientific influence. 

“Let’s forge this connection between the social 
science and the hard sciences,’ Michel Temer, 
vice-president of Brazil, told the 6th World 
Science Forum in Rio de Janeiro — the reason 
for my visit in November. The point was force- 
fully reiterated by Linxiu Zhang of the Chinese 
Academy of Sciences, and many other speakers. 

Make no mistake: the geographical balance 
of power in global science is shifting. China 
has surpassed the United States as the world’s 
largest PhD factory (see Nature 472, 276-279; 2011) and about now, 
according to a 2011 report by Britain’s Royal Society, it is scheduled 
to surpass the US volume of scientific literature in research journals. 
Brazil awarded 14,000 PhDs last year. 

The shift is accompanied by real political determination from the 
emerging powers to couple the social sciences with ‘hard’ science and 
engineering to address society’s needs. For their own pressing political 
reasons, the leaders of Brazil, China and other fast-growing economies 
need answers to mounting societal problems — water, food, health, 
energy and climate change, for example. That is not the case in the 
United States or Europe, where leaders’ priorities are short-term and 
financial, and science is arranged to suit various stakeholders — nota- 
bly firms that supply drugs and military equipment — as well as the 
needs of scientists themselves and their universities. 

There are well-charted historical reasons for the West’s narrow view 
of what constitutes science. Around 1900, scientists of the Royal Society 
of London distanced themselves from colleagues 


R: de Janeiro’s peerless Copacabana beach has not changed 


in the humanities (leading to the formation ofthe DNATURE.COM 
British Academy), andthe US National Academy _ Discuss this article 
of Sciences followed the same path. online at: 

The outcome has been subtler in mainland _go.nature.com/af8erx 


MAKE NO MISTAKE: 
THE GEOGRAPHICAL 


BALANCE OF 
POWER 


IN GLOBAL SCIENCE IS 


SHIFTING. 


~ W Emerging powers need a 
= more-inclusive science 


Fast-growing economies can learn from the West’s mistakes and couple social 
and ‘hard’ sciences to address their own societal needs, says Colin Macilwain. 


Europe. The German word for science, Wissenschaft, acknowledges 
a wider body of knowledge than just the natural sciences, for exam- 
ple; and the former president of the prestigious European Research 
Council, Helga Nowotny, is a sociologist. 

Yet the question of fair treatment for the social sciences is dogging 
the new European Union (EU) research programme Horizon 2020, the 
largest in the world outside the United States. Social scientists feel that 
they have been locked out of the drafting of the Horizon 2020 work 
programmes. At a 26 November meeting in Brussels on ‘smart cities, 
for example, speakers castigated the planned programme for concen- 
trating on technology-led pilots, even though the real roadblock is 
how people use the technologies we already have. 

These are not abstract, philosophical questions: quantitative behav- 
ioural research could readily fill knowledge gaps 
and design processes that would enable people 
to better manage their energy use, for example. 
But it does not happen because EU research 
programmes are also designed around the needs 
of stakeholders: in this case, device manufactur- 
ers, power companies and university scientists 
and engineers who know the ropes from previ- 
ous programmes. 

Another closely associated issue raised at the 
Rio meeting is the fact that global science still 
has a huge problem with research ‘silos; in which 
researchers are obliged to operate within insular, 
sometimes archaic disciplines. This was broached 
by physicist Luiz Davidovich, a director of the 
Brazilian Academy of Sciences in Rio, who called for the “reformula- 
tion of the university, towards interaction between disciplines”. But 
the West's funding agencies and universities — as well as its publishing 
industry — are all set up in ways that have persistently stymied such 
change. An opportunity surely exists for emerging scientific powers 
to do things differently as they grow, by building an interdisciplinary 
outlook into their structures. 

The World Science Forum is just one instrument that is attempting to 
address such problems. In 2012, the Global Research Council was cre- 
ated at the instigation of Subra Suresh, then director of the US National 
Science Foundation, as a vehicle for the wider governance of science. 

Existing worldwide organizations have limited influence, however. 
The new global agenda is more likely to be driven by the most powerful 
of the emerging powers: China, in particular, but also Brazil, India, 
South Korea and South Africa. That group of emerging nations has 
the opportunity, right now, to build a science that will serve not just 
the interests of national oligarchies, or of researchers themselves, but 
of society at large. m 


Colin Macilwain writes about science policy from Edinburgh, UK. 
e-mail: cfmworldview@googlemail.com 
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SEVEN DAYS nescnsi 


POLICY 


EU clinical trials 

A long-running effort to 
reform the regulation of 
clinical trials in the European 
Union concluded on 

20 December. A new law will 
replace the much-maligned 
Clinical Trials Directive, and 
will streamline and standardize 
applications for trials. The 
Clinical Trials Regulation, 
which includes compulsory 
preregistration of all trials and 
tougher informed-consent 
requirements, must be 
formally approved before it can 
take effect. 


Primate problems 
The US Department of 
Agriculture has fined Harvard 
Medical School in Boston, 
Massachusetts, more than 
US$24,000 for violating 

the Animal Welfare Act. 
Several of the 11 violations 
announced by the agency 

on 18 December occurred 

at Harvard’s troubled New 
England Primate Research 
Center in Southborough, 
Massachusetts, which is slated 
to close by mid-2015. In late 
2011, two primates became 
dehydrated as the result 


TREND WATCH 


Raw data from research 


of a malfunctioning water 
dispenser. One of the animals 
subsequently died. 


E-cigarette rules 
European legislators have shied 
away from tough proposals to 
regulate electronic cigarettes 
as medical devices. According 
to legislation agreed on 

18 December, e-cigarettes will 
be subject to the less-stringent 
controls that are applied to 
tobacco products, unless they 
are marketed with health 
claims. Formal approval of the 
agreement, which includes 
policy changes for tobacco 
products, is needed before 

the rules can come into force 
in 2014. See go.nature.com/ 
wobgkx for more. 


Geneticist dies 


Geneticist Janet Rowley 
(pictured) of the University 
of Chicago in Illinois died 
on 17 December, aged 88. In 
the 1970s, Rowley identified 
a translocation — in which 
genetic material is swapped 
between chromosomes 

— in leukaemia cells. For 
her work on that and other 
translocations, Rowley was 


MISSING DATA 


one of three scientists to share 


a Lasker Award in 1998 for 
clinical medical research. From 
2002 to 2009, she served on 
former US President George 
W. Bush’s bioethics council, 
and wasa vocal opponent 

of the Bush administration's 
restrictions on US embryonic 
stem-cell research. 


EPA fraudster 


John Beale, a former official 
at the US Environmental 
Protection Agency, was 
sentenced on 18 December 
to 32 months in prison for 
stealing nearly US$900,000 
from the organization. 
Starting in 2000, Beale 
collected a salary and travel 
reimbursements while 
skipping a total of 2.5 years of 
work and falsely claiming to 
be working for the US Central 


As research articles age, the odds of their raw data being extant 


drop noticeably. 


Intelligence Agency. Beale has 
agreed to pay the government 
nearly $1.4 million in 
restitution and penalties. 


UCSF chancellor 


Susan Desmond-Hellmann 
will step down as chancellor 
of the University of California, 
San Francisco (UCSB), in 
March to become chief 
executive of the Bill & Melinda 
Gates Foundation in Seattle, 
Washington. UCSF said 

on 17 December that it will 
appoint Sam Hawgood, dean 
of its medical school, as interim 
chancellor, pending approval. 


EVENTS 


Star surveyor 

The European Space Agency’s 
Gaia mission, designed 

to map the Milky Way 

in unprecedented detail, 
launched on 19 December 
from Kourou, French Guiana. 
The mission is tasked with 
charting a billion stars, with the 
aim of helping astronomers to 
better understand the origins 
of our Galaxy. See go.nature. 
com/bhxugp for more. 


Pharma fees out 

In abid to mend its public 
image, pharmaceutical giant 
GlaxoSmithKline (GSK) will 
by 2016 start to phase out direct 
payments to physicians for 


UNIV. CHICAGO MEDICAL CENTER 


publications are vanishing rapidly, 
according to an analysis of 516 
ecology papers published between 
1991 and 2011 (T. H. Vines et al. 
Curr, Biol. http://doi.org/qpm; 
2013). Data could be obtained 

for most of the 2011 papers, but 
availability fell by 17% for each 
previous year (see chart). As few 
as 20% of authors of papers from 
the early 1990s could provide 
data, as a result of the information 
being misplaced or stored on 


attending medical conferences 
or giving promotional talks 
about GSK products, the firm 
said on 17 December. By early 
2015, it also plans to scrap the 
use of prescription sales targets 
in determining pay for sales 
representatives. GSK currently 
faces investigation in China for 
allegedly bribing physicians and 
officials to boost its drug sales 
(see Nature 499, 385; 2013). 


Data extant (assuming author responded) 


defunct technology. See go.nature. Age of paper (years) > NATURE.COM 
com/jmosxn for more. For daily news updates see: 
WWw.nafure.com/news 
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CLIMATE Water will be scarce PHYSICS With $50 million 2014 Transgenic monkeys, Pi ’ 
in warmer world, huge study spent, X-ray source is now comets and more will make 
says p.10 out in the cold p.ll the news this year p.13 


de. EPIGENETICS A scientist’s bid 


to track down the roots of 
aggression p.14 


VA 


The intense electrical discharge of the Z machine at Sandia National Laborator' 


PLASMA PHYSICS 


Triple-threat method 
sparks hope for fusion 


The secrets to its success are lasers, magnets and a big pinch. 


BY W. WAYT GIBBS 


he Z machine at Sandia National Labo- 
ik in New Mexico discharges the 

most intense pulses of electrical cur- 
rent on Earth. Millions of amperes can be sent 
towards a metallic cylinder the size of a pencil 
eraser, inducing a magnetic field that creates 
a force — called a Z pinch — that crushes the 
cylinder in a fraction ofa second. 


Since 2012, scientists have used the Z pinch 
to implode cylinders filled with hydrogen 
isotopes in the hope of achieving the extreme 
temperatures and pressures needed for energy- 
generating nuclear fusion. Despite their efforts, 
they have never succeeded in reaching ignition 
— the point at which the energy gained from 
fusion is greater than the energy put in. 

But after tacking on two more components, 
physicists think they are at last on the right path. 
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Researchers working on Sandia's Magnetized 
Liner Inertial Fusion (MagLIF) experiment 
added a secondary magnetic field to thermally 
insulate the hydrogen fuel, and a laser to pre- 
heat it (see ‘Feeling the pincly). In late Novem- 
ber, they tested the system for the first time, 
using 16 million amperes of current, a 10-tesla 
magnetic field and 2 kilojoules of energy from 
a green laser. 

“We were excited by the results,” says > 


2 JANUARY 2014 | VOL 505 | NATURE | 9 


| NEWS IN FOCUS 


> Mark Herrmann, director of the Z machine 
and the pulsed-power science centre at Sandia. 
“We look at it as confirmation that it is working 
like we think it should” 

The experiment yielded about 10'° high- 
energy neutrons, a measure of the number of 
fusion reactions achieved. This is a record for 
MagLIEF, although it still falls well short of igni- 
tion. Nevertheless, the test demonstrates the 
appeal of such pulsed-power approaches to 
fusion. “A substantial gain is more likely to be 
achieved at an early date with pulsed power,’ 
says nuclear physicist David Hammer of Cornell 
University in Ithaca, New York, who co-wrote a 
2013 US National Research Council assessment 
of approaches to fusion energy. 

With its relatively slim US$5-million annual 
budget, MagLIF is a David next to two fusion 
Goliaths: the $3.5-billion National Ignition 
Facility (NIF) at Lawrence Livermore National 
Laboratory in California, and the €15-billion 
(US$20-billion) ITER experiment under con- 
struction in France. (Sandia has about $80 mil- 
lion to operate the Z machine each year, but it 
serves other experiments in addition to Mag- 
LIF.) The NIF squashes fuel capsules using 
nearly 2 megajoules of laser energy, and ITER 
will use 10,000 tonnes of superconducting mag- 
nets in a doughnut-shaped ‘tokamak’ to hold a 
plasma in place to coax self-sustaining fusion. 

Both of the big projects have run into prob- 
lems. After a concerted two-year effort, NIF 
fell well short of achieving ignition by a 2012 
deadline. Its fusion yields have since increased 


FEELING THE PINCH 


Magnetized Liner Inertial Fusion uses a heating 
laser, a stabilizing magnetic field and a force called 
a Z pinch to implode a cylinder of hydrogen fuel. 
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markedly — nearly 10'° neutrons were created 
in a recent shot, Herrmann says — but the 
more than $300-million-a-year programme 
faces further budget cuts in 2014. Meanwhile, 
delays and budget overruns have become the 
norm at ITER. The facility is not expected to 
begin operations until 2027 — 11 years later 
than initially planned. 

In addition to being cheaper, MagLIF seems 
to have technical advantages. The laser not only 
preheats the hydrogen fuel, but also makes it 
more conductive — and thereby more suscepti- 
ble to the Z pinch. Furthermore, in a paper pub- 
lished late last year, MagLIF physicists showed 
evidence suggesting that the applied secondary 


magnetic field, as well as insulating the fuel, may 
have the happy side effect of stabilizing the cyl- 
inder as it implodes (T. J. Awe et al. Phys. Rev. 
Lett. 111, 235005; 2013). If so, that would cut 
down on hydrodynamic instabilities, which can 
disperse the energy and fuel before fusion can 
get going, says Stephen Slutz, a Sandia physicist 
who proposed the MagLIF system in 2009. 

In the next few years, MagLIF scientists plan 
to turn up all three dials at their disposal. They 
can boost the Z machine to up to 27 million 
amperes; they can ramp up the magnetic field 
to as high as 30 tesla; and they plan to upgrade 
the laser to 8 kilojoules. They also aim to switch 
from fuel made of the hydrogen isotope deu- 
terium to fuel containing both deuterium and 
another isotope, tritium — which should also 
lift yields. By 2015, they hope to achieve a yield 
of 10'° neutrons, or about 100 kilojoules — 
enough to show that ignition is within reach. 

It could be crucial to make progress quickly. 
The US National Nuclear Security Administra- 
tion, the division of the Department of Energy 
that funds the NIK, the Z machine and other 
laser fusion efforts, plans to deliver an assess- 
ment to Congress in 2015 about the future of 
these technologies. If MagLIF hits its 100-kilo- 
joule goal, it could bolster an argument 
for upgrading the Z machine to 60 million 
amperes or more, which simulations suggest 
would be sufficient to reach ignition. 

“We're all hoping that they will, in fact, find 
success with their early shots to justify the con- 
struction ofa larger machine,’ says Hammer. » 
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CLIMATE CHANGE 


Water risk as world warms 


First comprehensive global-impact project shows that water scarcity is a major worry. 


BY QUIRIN SCHIERMEIER 


hen pondering the best way 
to study the impact of climate 
change, researcher Hans Joachim 


Schellnhuber liked to recall an old Hindu 
fable. Six men, all blind but thirsty for know- 
ledge, examine an elephant. One fumbles the 
pachyderm’s sturdy side, while others grasp at 
its tusk, trunk, knee, ear or tail. In the end, all are 
completely misled as to the nature of the beast. 

The analogy worked. Although many 
researchers had modelled various aspects of 
the global-warming elephant, there had been 
no comprehensive assessment of what warm- 
ing will really mean for human societies and 
vital natural resources. But that changed last 
year when Schellnhuber, director of the Pots- 
dam Institute for Climate Impact Research in 
Germany, and other leading climate-impact 


researchers launched the Inter-Sectoral Impact 
Model Intercomparison Project. This aims to 
produce a set of harmonized global-impact 
reports based on the same set of climate data, 
which will for the first 
time allow models 


© a 
to be directly com- a aa 
pared. Last month it tharresult 
published its initial . 5 
results in four reports id dome. iad 
in Proceedings of the instability and 


National Academy migration.” 

of Sciences’ *. These 

suggest that even modest climate change 

might drastically affect the living conditions of 

billions of people, whether through water 

scarcity, crop shortages or extremes of weather. 
The group warns that water is the biggest 

worry. If the world warms by just 2 °C above 

the present level, which now seems all but 
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unavoidable by 2100, up to one-fifth of the 
global population could suffer severe shortages. 

“Water and all that relies on it, from food to 
sanitation and public health, is an emblematic 
aspect of climate change whose urgency people 
tend to instantly understand,’ says Schellnhuber. 

To assess what a warmer world might mean 
for the human race, 30 groups from 12 coun- 
tries have run thousands of simulations, using 
a standardized set of scenarios for greenhouse- 
gas emissions. They made projections of future 
water availability from a set of global hydrologi- 
cal models in conjunction with five state-of-the- 
art climate models’ that combined projections 
of changes in temperature and precipitation 
with data on variables such as regional water 
cycles, river run-off and population. 

The multi-model assessment suggests that, 
in vulnerable regions, climate change will sig- 
nificantly add to the problem of water scarcity 


DIETER TELEMANS/PANOS 


that is already arising from population growth. 
The modellers found that climate-driven 
changes in evaporation, precipitation and run- 
off will result in a 40% increase in the number 
of people worldwide who must make do with 
less than 500 cubic metres of water per year — a 
commonly used threshold to signify ‘absolute’ 
water scarcity. 

The spread between individual models was 
large — some suggested that global exposure to 
water scarcity will double; others predicted only 
modest change. But no matter what the spread, 
the greatest effects were seen between the pre- 
sent-day climate and a 2°C warmer world. 

Despite the ambiguities, the exercise will 
make climate-risk analysis substantially more 
robust, says Johan Rockstrém, an expert on 
water resources at the University of Stockholm 
and director of the Stockholm Resilience Cen- 
tre, who was not involved in the project. 

“Impact models will never be able to pro- 
vide the level of detail that ultimately matters 
for making a city or coastline climate-proof? 
he says. “But they do serve as a first approxima- 
tion to the severe problems deficient regions 
and nations are facing.” 

Regions most at risk from water scarcity 
include parts of the southern United States, the 
Mediterranean and the Middle East. By con- 
trast, India, tropical Africa and high latitudes in 
the Northern Hemisphere can expect to receive 
more water in a warming world. 

The projected changes in water availability 
have knock-on effects in other areas that rely on 
water. The group that modelled the response of 
crops to climate change found negative impacts 
on yields of major crops in many agricultural 
regions’. 

In addition, drought conditions are likely to 
become more frequent and severe in some parts 
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of South America, western and central Europe, 
central Africa and Australia, another project 
team reports’. Flood risk is less clear-cut, but 
river-flow simulations from global hydrology 
and land-surface models did show an increase 
in flood hazard in more than half of the world’. 

Despite their uncertainty, the findings are “a 
stark reminder” that even moderate warming 
has the potential to cause severe natural, social 
and economic disruptions, says Rockstrém. 
“We are facing problems that result in domes- 
tic instability and migration.’ Rethinking inter- 
national trade with a view to giving the most 
needy nations better access to the global food 
market will be essential, he says. 


Water scarcity in parts of Africa could become worse, according to a complementary set of climate projections. 


Uncertainty, adds Schellnhuber, is no excuse 
for inaction. “Those who might say, ‘Come back 
when you've narrowed down the risk should 
be reminded that climate change is a treacher- 
ous gamble,” he says. “We don’t quite know the 
odds, but the chance of losing heavily might be 
alot bigger than many tend to think” = 
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X-ray source left without home 


No plans to build next-generation accelerator despite large investment by US agency. 


BY EUGENIE SAMUEL REICH 


ccelerator physicists have a vision: 
A= energy-efficient X-ray source that 

can make high-resolution movies of 
molecules in chemical reactions. And the US 
National Science Foundation (NSF) has backed 
the dream — since 2005, it has invested more 
than US$50 million to develop such a source, 
most likely beneath the campus of Cornell 
University in Ithaca, New York. 

But there is one big problem: despite the 
inflow of cash, no US government agency has 
any plans to build the machine. 

The source, called an energy recovery 


linear accelerator (ERL), would be a hybrid 
of a synchrotron, in which electrons emit 
X-rays while whirling around a ring, and a 
free-electron laser, in which straight beams of 
electrons are induced to produce bright pulses 
of X-ray light. 

The Cornell project is currently receiving 
$27 million in a single award from the NSF’s 
materials division — by far the divisions larg- 
est grant for instrument development. But in 
July, the ERL concept was ranked the lowest of 
three potential next-generation X-ray sources 
by an advisory panel to the US Department 
of Energy. And in December, officials at the 
NSF told Nature that the agency has no plans 
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to move forward with construction. 

Despite all this, Thomas Rieker, the NSF 
programme manager for the ERL materials 
grant, says that the research effort has been 
a success, providing component designs that 
would allow an accelerator to be built quickly. 
“We wanted to keep our options open,” he says. 
“That was the impetus for funding it” 

An NSF advisory panel had strongly recom- 
mended in 2008 that the NSF invest in an ERL. 
So why the turnaround? Agency officials now 
say that the NSF’s priorities and the budgetary 
climate have changed, and that a machine cost- 
ing upwards of $1 billion would not be a good 
use of taxpayers’ money. > 
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ELECTRONS IN RECOVERY 


Cornell University in Ithaca, New York, wants to convert an existing synchrotron into an energy recovery 
linear accelerator, but it is unlikely that either the US National Science Foundation or the Department of 


Energy will build it. 


0 


isting electron- 
storage ring 


Xray beamlines EU Vee 


SS 


> Some physicists are expressing frustra- 
tion over seeing so much research money 
apparently going nowhere. “The NSF should 
really decide if there's a real need for this in the 
country,’ says Sunil Sinha, a condensed-matter 
physicist at the University of California, San 
Diego, who advised on the energy-department 
panel. 

The idea for an ERL was developed in 1965 
by Cornell physicist Maury Tigner. It involves 
injecting electrons into a linear accelera- 
tor (linac) and then wiggling the particles to 
prompt the emission of X-ray pulses. The 
energy-recovery aspect comes from a loop that 
ushers the electrons gently around to enter the 
linac a second time. Their arrival is timed so 
that their energy is transferred to a new bunch 
of electrons that will then be accelerated. 

The approach has several advantages. 
For starters, it would be vastly more energy- 
efficient than a free-electron laser, which 
recovers no energy. That makes it practical 
to keep electrons 
streaming continu- “The NSF should 
ously, rather than really decideif 
in widely separated there’sareal 
bunches. AnERLcan need for this.” 
also focus its electron 
beam, and hence the resulting X-rays, to a 
tighter spot than the beams in current syn- 
chrotron rings, which spread out as they lose 
energy going around in circles. This would 
allow for more-advanced studies of the atomic 
energy levels in materials. 

Japan and the United Kingdom have both 
expressed interest in building an ERL, and 
there is a small demonstration version of an 
infrared ERL at the Thomas Jefferson National 
Accelerator Facility in Newport News, Vir- 
ginia. But Cornell’s plan is the most advanced 
X-ray ERL effort in the United States. 

Grant documents stress that the ERL 
research is not site-specific, meaning that it 
could feed into projects elsewhere. But most 
experts think that, ifan ERL gets built, it would 
be at Cornell, where it could reuse the tunnels 
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of an existing NSF-funded X-ray light source, 
the Cornell High Energy Synchrotron Source 
(see ‘Electrons in recovery’). “We feel the con- 
struction of an ERL can go right ahead,” says 
Georg Hoffstaetter, an accelerator physicist 
who is leading the Cornell effort. 

The capabilities of an ERL would overlap 
with those of other planned light sources. In 
California, the Department of Energy has 
plans to build a free-electron laser, perhaps by 
upgrading the Linac Coherent Light Source at 
the SLAC National Accelerator Laboratory in 
Menlo Park, California. This machine would 
provide images of materials with unprec- 
edented resolution in space and time, using 
fast pulses of high-energy X-ray beams. 

Pulses of X-ray light from an ERL would not 
be as fast, but they would be gentler, and nearly 
continuous — more appropriate for probing 
sensitive samples such as biological speci- 
mens. However, next-generation ring-shaped 
light sources, such as an upgrade planned at 
the Advanced Photon Source near Chicago, 
Illinois, will also stream continuous light. 
Although less bright and lower in energy than 
an ERL, such sources would still prove useful 
for biological imaging. 

The energy department’s decision to go 
with the other machines will make it harder 
for the ERL to justify itself scientifically, says 
Paul Evans, a materials scientist at the Univer- 
sity of Wisconsin—Madison. “Defining the niche 
they're headed for is the critical challenge.” 

Even if the ERL is not built, Cornell scien- 
tists say that the research has been useful. Their 
work is aiding design of superconducting cavi- 
ties for the Deparment of Energy’s future free- 
electron laser, which could also one day have 
energy-recovery loops tacked on. Cornell has 
also developed a high-current electron gun 
that could be used in other accelerators to 
generate X-rays or to study particle collisions. 

But although he takes satisfaction in the 
spin-off possibilities, Hoffstaetter is not ready 
to give up on the ERL’s construction. “The ERL 
is a wise investment,” he says. m 


SOURCE: CORNELL UNIV. 
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What to expect in 2014 


Nature takes a look at what is in store for science in the new year. 


TRANSGENIC MONKEYS 
Several research groups, including a team 
led by geneticist Erika Sasaki and stem-cell 
biologist Hideyuki Okano at Keio University 
in Tokyo, hope to create transgenic primates 
with immune-system deficiencies or brain 
disorders. This could raise ethical concerns, 
but might bring us closer to therapies that are 
relevant to humans (mice can be poor models 
for such disorders). The work will probably 
make use of a gene-editing method 
called CRISPR, which saw rapid 
take-up last year. 


SPACE PROBES 

The European Space Agency’s 
Rosetta spacecraft could become 
the first mission to land a probe 
ona comet. If all goes well, it will 
land on comet Churyumov- 
Gerasimenko in November. Mars 
will also be a busy place: India’s 
orbiter mission should arrive at the 
planet in September, about the same 
time as NASA’s MAVEN probe. 
And NASA%s Curiosity rover should 
finally make it to its mission goal, 
the slopes of the 5.5-kilometre- 
high Aeolis Mons, where it will 
look for evidence of water. Back 
on Earth, NASA hopes to launch 
an orbiter to monitor atmospheric 
carbon dioxide. 


NEURAL FEATS 

Neurobiologist Miguel Nicolelis at Duke Uni- 
versity in Durham, North Carolina, has devel- 
oped a brain-controlled exoskeleton that he 
expects will enable a person with a spinal-cord 
injury to kick the first ball at the 2014 football 
World Cup in Brazil. Meanwhile, attempts are 
being made in people with paralysis to recon- 
nect their brains directly to paralysed areas, 
rather than to robotic arms or exoskeletons. In 
basic research, neuroscientists are excited about 
money from big US and European brain initia- 
tives, such as Europe’s Human Brain Project. 


NOVEL DRUGS 

In the pharmaceutical industry, all eyes are 
on trial results from two competing antibody 
treatments that harness patients’ immune sys- 
tems to fight cancer. The drugs, nivolumab 
and lambrolizumab, work by blocking pro- 
teins that prevent a person’s T cells from 
attacking tumours. In early tests, the drugs 
evoked a better level of response in patients 


than ipilimumab, a similar therapy that was 
launched in 2011 to treat advanced melanoma. 


RENEWABLE REVOLUTION 

Semiconductors known as perovskites convert 
light energy into electricity. They are cheap 
to build and have already shown conversion 
rates of more than 15% (a leap from 4% when 
the feat was first reported in 2009). Expect to 
see still-higher efficiencies this year, perhaps 


An artist’s impression of the European Space Agency’s Rosetta probe, 
which aims to be the first to land on a comet. 


reaching 20% — the same as the lower end 
of existing commercial silicon-based photo- 
voltaics. A team at the University of Oxford, 
UK, also hopes to make lead-free perovskites. 


HIV HOPE 

In 2013, two research teams showed that 
‘broadly neutralizing’ antibodies that target 
an array of HIV types quickly cleared an HIV- 
related virus in monkeys. The therapy will be 
tested in people who carry HIV, with results 
expected in the autumn. Meanwhile, last year’s 
curing of a baby born with the virus might 
lead to wider trials of the technique used: high 
doses of antiretroviral drugs given at birth. 


MINIATURE SEQUENCER 

Technology that rapidly sequences DNA as it 
is fed through a ring of proteins, known as a 
biological nanopore, will hit the market this 
year after decades of development. Oxford 
Nanopore Technologies in Oxford, UK, aims 
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to release the first data from a disposable 
sequencer the size of amemory stick, which it 
is sending to scientists for testing. It promises 
to read longer strands of DNA than other tech- 
niques (potentially useful in sequencing mixed 
samples of bacterial DNA, for example), and to 
show results in real time. 


ABETTER CLIMATE 

The Intergovernmental Panel on Climate 
Change will complete its fifth 
assessment report by November. 
The findings of working groups II 
and ITI will focus on the impacts of 
climate change, and on how socie- 
ties can adapt to or mitigate those 
effects (working group I published 
its findings last year). Away from 
formal negotiations, United Nations 
secretary-general Ban Ki-moon is 
hoping for “bold pledges” on emis- 
sions at a summit in New York in 
September. In research, a large 
carbon capture and storage project 
in Canada — the Can$1.24-billion 
(US$1.17-billion) Boundary Dam 
coal power-plant in Saskatchewan 
— begins commercial operation 
in April. 


MAKING WAVES 

The European Space Agency’s 
Planck satellite team should 
release data on how the polariza- 
tion of photons from the Universe's 
cosmic microwave background varies across 
the sky. This esoteric pattern is thought to have 
been generated by ‘inflation, the rapid expan- 
sion of the Universe after the Big Bang. If it can 
be detected, its details could provide evidence 
of relic gravitational waves, thought to have 
perturbed space-time in the early Universe. 


STEM-CELL REGENERATION 

A Japanese team will start the first clinical trials 
using induced pluripotent stem cells this year — 
but don't expect results anytime soon. And bio- 
technology firm Advanced Cell Technology in 
Santa Monica, California, says that it will release 
data from two trials using human embryonic 
stem cells — the only two to gain approval from 
US drug regulators. These two studies involve 
injecting stem-cell-derived retinal cells into the 
eyes of around 30 people with one of two forms 
of non-treatable degenerative blindness. m 
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RICHARD TREMBLAY HAS TRACED THE 
ROOTS OF CHRONIC AGGRE VE 
BEHAVIOUR BACK AS FAR AS INFANCY 
NOW HE HOPES TO GO BACK FURTHER 
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BY STEPHEN S. HALL 


ochelaga was the original 

Iroquoian name for the village 

that ultimately became Mon- 

treal, but it is also the name of a 

rough-hewn French-Canadian 
neighbourhood located east of — and a world 
away from — the cosmopolitan city centre. 
The district's tidy two- and three-storey brick 
duplexes, adorned with Montreal's character- 
istic wrought-iron staircases, predominantly 
house families that have, because of poverty 
and lack of education, never quite attained 
thriving middle-class status. 

During the 1980s, public-school officials 
identified Hochelaga and many other impov- 
erished neighbourhoods in the eastern part 
of Montreal as places where kindergarten 
children disproportionately displayed severe 
behavioural problems, such as physical aggres- 
sion. The school system asked a young Univer- 
sity of Montreal psychologist named Richard 
Tremblay for help. 

“Their parents didn’t have a high-school 
diploma, and many of the mothers had their 
first child before the age of 20,” Tremblay 
says of the families he began to study, as he 
walks along Rue Ontario in Hochelaga on a 
sunny afternoon in September. Those were 
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the women, he adds, “most at risk of having 
children who have problems”. 

Over the past three decades, Hochelaga and 
similar neighbourhoods have served as living 
laboratories in the study of the roots of aggres- 
sion. Since 1984, Tremblay and his collabora- 
tors have followed more than 1,000 children 
from 53 schools in the city from childhood into 
adulthood. And in 1985, he initiated a ground- 
breaking experiment in which some families 
of at-risk children were given support and 
counselling to help curb bad behaviour. His 
research overturned ideas about when aggres- 
sive behaviour first emerges, and showed that 
early intervention can deflect children away 
from adult criminality. 

The idea that a nurturing environment 
provides better outcomes for children hardly 
qualifies as news, but Tremblay has taken this 
idea in a provocative direction in the past ten 
years. He has joined researchers at McGill 
University in Montreal and the US National 
Institutes of Health (NIH) in Bethesda, Mary- 
land, to investigate how nurturing or adverse 
environments might exert their effects at the 
molecular level, influencing gene expression 
through a mechanism known as epigenet- 
ics. Tremblay’s Canadian cohorts are part 
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of a growing trend for using longitudinal 
studies, which follow the same individuals 
over an extended period of time, to look for 
epigenetic signatures that might affect health 
and behaviour later in life. Research in this 
area is still preliminary — and not without 
its critics — but Tremblay believes that a firm 
grasp of early epigenetic effects could inform 
interventions to influence everything from 
obesity to mental illness. 

“There is a body of evidence, from natural 
experiments and actual experiments, showing 
that early-life experiences affect long-term 
outcomes such as crime, health and wages,” 
says James Heckman, a Nobel-prizewinning 
economist at the University of Chicago in Illi- 
nois who is currently working with Tremblay 
on an early-intervention study with at-risk 
pregnant mothers in Dublin. The work of 
Tremblay and others, he says, “has established 
a firmer biological basis for how early-life 
experiences affect these processes”. 


ROGER LEMOYNE/REDUX PICTURES/EYEVINE 


Tremblay’s own early life revolved around 
sport. His father, Wilfrid Tremblay, played 
Canadian football from 1938 to 1951, and 
Richard was an accomplished ice-hockey 
goalie. When Jacques Plante, the Hall of Fame 
goalie for the Montreal Canadiens, suffered 
an injury during the Stanley Cup Playoffs in 
1961, a team representative called the then-17- 
year-old Tremblay asking ifhe could report to 
the minor league practice rink the next morn- 
ing. Tremblay, soft-spoken and mild-man- 
nered, allows that he was “invited to join” the 
most illustrious franchise in Canadian sports, 
but concluded that he was too small to play 
at the professional level. He decided to attend 
college instead. 

Tremblay studied physical education at 
the University of Ottawa. But before his final 
year, he read a cult novel by J. R. Salamanca 
called Lilith (Simon & Schuster, 1961), about a 
recreational therapist who falls in love with a 
young female patient at a psychiatric hospital. 
To a naive 20-year old, the work sounded fas- 
cinating, and when he returned to college that 
autumn he applied for a job as a recreational 
therapist at a high-security psychiatric hospital 
in Joliette, Quebec. He quickly found himself 
in over his head, working with convicted mur- 
derers and violent criminals. But it was during 
this time that he first started to wonder about 
the psychology of aggression. “It shows how a 
novel can change a life,” he says. 

The hospital agreed to send him to get a 
master’s degree in psychology, which he pur- 
sued at the University of Montreal. As Trem- 
blay likes to say: “The first thing I did after 
finishing my master’s degree was to go to jail 
for three years.” That was the Pinel Institute, a 
new maximum-security psychiatric hospital 
in Montreal. Most of the people there, he says, 
“had killed someone or were dangerous to the 
point of killing themselves, or others”. Despite 
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the danger, he found himself going to work on 
his days off to play sports with the residents. “I 
loved it,” he recalls. 

Then, in 1971, the University of Montreal 
decided to create a school focused on children 
with behavioural problems. The university 
wanted to hire Tremblay, one of the most 
promising students to come out of its psy- 
chology programme, to join the faculty. But 
he needed a PhD first, so the university paid 
for his training at the University of London's 
Institute of Education. 

That turned out to be a defining experience. 
Tremblay arrived in London with a sheaf of 
Rorschach blots and a grounding 
in psychoanalysis, but there he 
was exposed to the ‘longitudinal’ 
philosophy of pioneering human- 
growth biologist James Tanner, 
child psychiatrist Michael Rutter 
and others. He came away with a 
lesson that has informed the rest of 
his scientific career: the best way to 
study any aspect of human devel- 
opment is to conduct longitudinal 
studies. He threw away his Ror- 
schach blots and, in the late 1970s, 
headed back to Montreal. 


AGGRESSIVE START 

By then, Tremblay was eager 
to launch his own longitudinal 
study. He got his chance in the early 1980s. 
School officials came to him with the problem 
of hyperactive, physically aggressive kinder- 
garten boys. He had never worked with chil- 
dren before and never imagined doing so, but 
he recognized it as an opportunity to explore 
the origins of aggressive behaviour. “The idea 
became very clear,’ he says. A longitudinal 
study of kindergarten children would give 
him a chance to link childhood behaviours 
with adolescent and adult outcomes. 

In 1984, he started tracking boys from doz- 
ens of schools. Funding was initially provided 
for three years, but nearly three decades later 
Tremblay and his colleagues continue to fol- 
low many of the men involved. They have 
published more than 160 papers on the group. 

Just one year in, when the boys were seven 
years old, Tremblay obtained a grant to adda 
randomized, controlled experimental interven- 
tion. Teams of four psychologists would visit 
the families of about 50 boys every two weeks. 
They counselled parents on identifying and 
correcting aggressive behaviour, and trained 
teachers to do the same. In addition, they 
attempted to socialize unruly boys, and they 
integrated problematic boys with well-behaved 
children to provide positive peer role models. 

The Montreal intervention began at a 
time known informally among criminolo- 
gists as the ‘nothing works’ era, when there 
was widespread pessimism about the poten- 
tial to rehabilitate juvenile delinquents and 
adult criminals. Tremblay’s intervention was 


16 | NATURE | VOL 505 | 2 JANUARY 2014 


labour-intensive and extremely expensive, and 
he recalls fretting that he was spending mil- 
lions of dollars on a study but might end up 
with nothing to show for it. “I guess I lost hope 
— in working with juvenile delinquents — that 
we could make a difference,” he says. 

The intervention lasted about two years, but 
the results would take much longer to become 
apparent. One of the first people to see hints 
that it was working was Joan McCord, a crimi- 
nologist at Temple University in Philadelphia, 
Pennsylvania. McCord had a reputation for 
ferreting out data that challenged conventional 
wisdom, most notably when she demonstrated 


THE ISSUE IS NOT HOW 
WE LEARN VIOLENCE, 


BUT RATHER HOW WE 
LEARN T0 CONTROLIT. 


in the 1970s that a famous US longitudinal 
experiment — the Cambridge-Somerville 
Youth Study, in which juvenile delinquents 
were mentored and supported — had actually 
harmed the young men it had aimed to help’. 
Conversely, the Montreal intervention seemed 
to be working as intended. With each follow- 
up assessment, boys in the intervention arm 
displayed not only less delinquent behaviour 
than controls, but also better school perfor- 
mance, lower consumption of drugs and alco- 
hol, and better social skills. 

Data gathered 15 years after the interven- 
tion ended revealed that it produced persis- 
tent positive effects. The boys whose families 
received support had a 46% graduation rate as 
opposed to 32% for controls. And, at the age of 
24, fewer of them had criminal records: 22%, 
versus 33% for controls’. 

But Tremblay wasn’t just seeking ways to 
mitigate bad behaviour — he was looking to 
uncover where it began. In the mid-1990s, 
he began to collaborate with Daniel Nagin, a 
criminologist at Carnegie Mellon University 
in Pittsburgh, Pennsylvania. Nagin applied 
a more sophisticated statistical metric to the 
burgeoning Montreal data set. The results, 
published in 1999, made it clear that the trajec- 
tory towards antisocial behaviour and crimi- 
nality in adolescence begins very early in life’. 
Most children exhibit decreasing aggression 
between the ages of 6 and 15: they learn to 
control their aggressive impulses. Only about 
4% of the boys displayed highly aggressive 
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behaviour in early childhood that continued 
into their teens. The roots of physical aggres- 
sion — and, by extrapolation, the origins of 
violent behaviour later in life — lie before the 
age of six, says Nagin. That is, before Trem- 
blay’s kindergarten cohort even began. 

Even as Nagin and Tremblay were analys- 
ing the original Montreal data, Tremblay had 
begun another longitudinal study designed 
to look at aggression before kindergarten. It 
was a birth cohort based in Quebec, and the 
resulting data suggested that aggressive behav- 
iour was evident at 17 months and peaked at 
around 42 months’. This and later work cul- 
minated in Tremblay’s ‘original sir’ 
hypothesis: that physical aggres- 
sion is the default setting in human 
behaviour’. It peaks between the 
ages of two and four, and is usu- 
ally socialized out of children by 
the time they enter school (see 
‘Aggression regression ). “We took 
the view that violence, and physi- 
cal aggression, is a part of us as a 
species,” says Nagin, “so the issue 
is not how we learn it, but rather 
how we learn to control it.” 

Many criminologists dismissed 
the findings. They argued not that 
the idea was wrong, but that it was 
irrelevant — that chronic child- 
hood aggression is trivial com- 
pared with murder and rape in adulthood, 
and that the former does not explain the lat- 
ter. Most still focus primarily on delinquency 
during adolescence, and for good reason, says 
sociologist Robert Sampson at Harvard Uni- 
versity in Cambridge, Massachusetts. “Early 
childhood is centrally important, but it’s not 
determinative, because there are still changes 
{in behaviour] later on.” 

Yet the Montreal and similar longitudi- 
nal studies show that heightened physical 
aggression at a young age correlates with seri- 
ous antisocial behaviour in adolescence and 
adulthood, says Tremblay. He is fond of citing 
the view that Saint Augustine offered some 
1,600 years ago: “It is not the infant’s will that 
is harmless,” he wrote, “but the weakness of 
infant limbs.” 


MARKING TIME 

With Saint Augustine’s headstrong infants in 
mind, Tremblay increasingly pondered the 
effects of the environmentat earlier and earlier 
ages. Like many researchers studying behav- 
iour, he had looked into what role genes might 
have in aggression, but he was dissatisfied. 
Genetics did not tell the whole story. Tremblay 
was primed, therefore, to hear about the work 
of Moshe Szyf, a cancer biologist at McGill, at 
a small Vancouver meeting in 2004. 

Szyf had been tracking the addition and 
removal of methyl groups to DNA, which 
can silence or activate genes. Scientists were 
interested in whether these methylation marks 
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might allow the environment to influence 
gene expression over an organism's lifetime. 
Michael Meaney, a developmental neurobiol- 
ogist also at McGill, collaborated with Szyf to 
show that newborn rat pups generously licked 
and groomed by their mothers had differ- 
ent patterns of DNA methylation from those 
that received less maternal attention®. These 
changes reached the brain, where the methyla- 
tion pattern altered the activity of a gene that 
plays a central part in the animal's 
response to environmental stress. 
Maternal nurture, Szyf argued, was 
a form of ‘environmental program- 
ming’ that altered the activity and 
function of genes in ways that per- 
sisted throughout life. 


For Tremblay, it was “as if the roof 3.0 
blew off” the room. The McGill 
experiments suggested a biological ig 


explanation for what he had been 
tracking for 20 years. As he walked 
to dinner with Szyf that evening, 
Tremblay pressed for a possible col- 
laboration. 

Human studies of this sort were 
uncharted territory. So Tremblay 
initiated a parallel line of animal 
research with Stephen Suomi, who 
heads the primate laboratory at 
the NIH’s Eunice Kennedy Shriver 
National Institute of Child Health and 
Human Development in Bethesda. 0 
Both scientists had noted behavioural 
similarities between the chronically 
aggressive, hyperactive boys in the 
Montreal study and a group of aggressive mon- 
keys that Suomi had raised under conditions of 
early maternal deprivation. Tremblay, Suomi 
and Szyf set out to run DNA-methylation stud- 
ies on two sets of monkeys: a group nurtured by 
their mothers, and another deprived of mater- 
nal nurturing from shortly after birth. It took 
nearly a decade of difficult molecular-biology 
work headed up by Nadine Provengal at McGill, 
but in the past year or so, the researchers have 
begun to publish their findings. 

The first primate study found distinct differ- 
ences in DNA-methylation patterns between 
nurtured monkeys and those separated from 
their mothers’. The epigenetic residue of post- 
natal adversity was broad, according to Suomi, 
affecting more than 4,000 genes — about one- 
fifth of the genome — and tending to cluster 
in certain chromosomal regions. Moreover, 
the epigenetic modifications seemed to alter 
expression of a gene that Suomi’s group had 
shown to be crucial to the function of the neu- 
rotransmitter serotonin’, low levels of which 
have been associated with elevated stress and 
aggression in humans. “These are not random 
changes,” Suomi says. “They follow particu- 
lar pathways.” The marks remained stable in 
monkeys up to 8 years old — an age roughly 
equivalent to 30 in humans. 

Although the team was able to test both 
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brain and white blood cells from the mon- 
keys, they only had access to blood from the 
men of the Montreal cohort. Even so, studies 
are starting to offer a complementary human 
picture. In July, Szyf and Tremblay reported 
that men with a history of chronic aggression 
dating back to kindergarten had significantly 
lower blood levels of immune molecules 
called cytokines than normal controls from 
the cohort’. These molecules are typically 
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A study of more than 10,000 Canadian children pointed to three basic 

trajectories for physical aggression. Most become less aggressive 
between the ages of 2 and 11 years, but a minority maintain a high level 
of aggression throughout childhood. 
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is far from simple: hundreds of genes are 
involved, and any single expression change 
is probably subtle. Yet, he says, “it seems rela- 
tively clear that there are large differences in 
DNA methylation between those who have 
a history of chronic aggression compared to 
those who have normal development”. He is 
convinced that the benefits of nurture merit 
early intervention programmes, regardless 
of the uncertainties in the biological part of 
the story. And he thinks that earlier 
intervention may produce even bet- 
ter results. “If we support these par- 
ents during pregnancy and if we help 
these women have a better lifestyle 
during pregnancy, with less stress, 
it should affect brain development, 
and these children should be better 
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able to learn how to control their 
aggressive behaviour,’ he says. 

He is already testing that hypoth- 
esis. In 2007, he accepted a ten-year 
appointment at University College 


52.2% of children 


Dublin, where he is assisting on sev- 
eral early-childhood longitudinal 


studies. One, called Preparing for 
Life and headed by economist Orla 
Doyle, is testing a preventive inter- 
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vention in 200 pregnant women 
from a disadvantaged area of north 


activated during the body’s response to stress, 
and animal studies have demonstrated a 
link between aggression and lower levels 
of a cytokine called interleukin-6, which 
was also lower in the chronically aggressive 
men. Ina second study, Szyf and Tremblay 
showed that members of the Montreal Lon- 
gitudinal Study with a long-standing history 
of aggression had a distinctly different pattern 
of DNA methylation in the genes encoding 
the cytokines, compared to men with a less 
aggressive behavioural profile””. 

The early human research has its shortcom- 
ings. For starters, the sample size is very small: 
only seven males with a history of aggression 
could be tracked down from the cohort for 
testing, along with 25 controls. And white 
blood cells are by no means the same as neu- 
rons, although Suomi notes that there is con- 
siderable overlap between the methylation 
patterns of the two cell types in the primate 
studies. Moreover, many researchers remain 
cautious about recent human epigenetic stud- 
ies. Attributing behavioural consequences to 
DNA methylation may be overreaching, says 
Adrian Bird, a geneticist at the University of 
Edinburgh, UK. “These are all correlations,” he 
says, “and often the magnitude of the change is 
very small indeed.” 

Tremblay is the first to admit that the story 


Dublin. During their pregnan- 

cies, the women received intensive 

home visits covering everything 

11 from nutrition, smoking, alcohol 

and drug counselling to support in 

marital relationships. And the sup- 

port continues until the children reach the age 

of four. James Heckman, who is also collabo- 

rating on the study, says that the plan includes 
future epigenetic studies of the cohort. 

“To solve the aggression problems, which 
are mainly a male problem, we need to focus 
on females,” Tremblay says. “If you ameliorate 
the quality of life of women, it will transfer to 
the next generation.” m 


Stephen S. Hall is a science writer in New 
York and teaches public communication to 
graduate students in science at New York 
University. 
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CULTURE This year’s hot tickets, - 
from opera to theatre, 
museums to movies p.22 


PHYSICS A compelling personal 
manifesto for the multiverse, 
on four levels p.24 


EDUCATION Massive open online 
course for dementia carers 
reaches new learners p.26 


COMMENT 


OBITUARY Frederick Sanger, 
winner of two Nobel prizes, 
remembered p.27 


CHIEN-MIN CHUNG/IN PICTURES/CORBIS 


Track flows to manage 


technology-metal supply 


Recycling cannot meet the demand for rare metals used in digital and green 
technologies, says Andrew Bloodworth. A more holistic approach is needed. 


emand for metals is soaring as the 
D global population booms and mil- 
lions of people in emerging econo- 
mies aspire to a Western lifestyle. The variety 
of metals we use has also expanded as tech- 
nology has advanced. As a result, historic 
fears regarding metal scarcity and resource 
depletion have returned in the past ten years. 
Concerns focus on the future supply of 
metals such as indium, lithium, rare-earth 
elements, tellurium and germanium, all of 
which are crucial to delivering new digital 
and low-carbon energy technologies, includ- 
ing photovoltaics and electric cars. 
In 2009, the issue came to global promi- 
nence when China reduced its exports of 


rare-earth elements, as the government 
sought to maintain supply to its rapidly 
expanding domestic manufacturing sector. 

Geopolitical and socio-economic risks — 
such as territorial disputes in Asia or labour 
relations in southern Africa — can interrupt 
supply because technology metals are pro- 
duced in very few locations. Commercial 
barriers compound the issue. Investment in 
these materials can be risky because they are 
difficult to extract and the markets are small, 
complex and volatile compared with those 
for iron, copper and aluminium. 

To secure supplies of metals for future 
technology, the scientific, industrial and pol- 
icy communities must work together. The 
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numerous assessments that governments 
have commissioned fall short. They iden- 
tify key issues, but generate lots of sterile 
argument as to whether a particular metal 
is ‘critical: The solutions they point to are 
generic and of little practical use. 
Prominent among these broad-brush 
responses has been the implication that 
the security of technology-metal supply in 
mainland Europe and the United Kingdom 
can be achieved mostly through recycling. 
Although recycling is important for man- 
aging stocks of common industrial metals, 
its application to technology metals is more 
complex. Some materials are impractical or 
impossible to retrieve after use. > 
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> More primary sources will be needed to 
meet rising demand and replace lost tech- 
nology metals. To find new resources, the 
geological processes that concentrate these 
metals need to be better understood. And 
to increase efficiency and avoid unintended 
environmental impacts, the flows of indi- 
vidual metals need to be mapped from the 
ground to the end of their use. 


RARE RESOURCES 

Demand for technology metals has exploded 
in the past 40 years, with 80% of the cumu- 
lative global production of gallium, rare- 
earth elements, platinum-group metals 
and indium taking place since 1980 (ref. 1). 
Growth is expected to continue for the fore- 
seeable future’. 

Most technology metals are mined in only a 
few places. In 2011, for example, 72% of global 
cobalt came from the Democratic Republic 
of the Congo’ and 57% of indium originated 
from China (see go.nature.com/crtooz and 
‘Metal producers’). Such metals are produced 
in low quantities. In 2011, just 72,900 tonnes 
of tungsten was extracted globally, compared 
with 45.2 million tonnes of aluminium and 
1.5 billion tonnes of crude steel’. 

Some studies have concluded that scar- 
city and depletion of technology metals are 
unavoidable as rising consumption will 
exceed current reserves’. These apocalyptic 
forecasts fail to take into account that geologi- 
cal reserves are dynamic, expanding as metal 
prices rise and extraction of lower-grade 
ores becomes economical and tractable, and 
contracting as prices fall. A combination of 
price pressures and technical advances has 
kept global reserves of most metals steady or 
growing over the past 50 years”. 

Because technology metals were of lim- 
ited economic interest until recently, there 
has been little imperative to look for them. 
Consequently, not much is known about 
their distribution in Earth or the natural 
processes that concentrate them. 

As knowledge improves, we will be able to 
reappraise old mining areas and explore new 
frontiers. Former mines in southwest Eng- 
land might hold promise for tungsten, for 
example, and a significant deposit of heavy 
rare-earth elements was identified in 2009 at 
Norra Karr in Sweden. However, overcom- 
ing public objections to new mines can be a 
major challenge, especially in the developed 
world, where populations are often reluctant 
to accept the resource consequences of con- 
spicuous consumption. 


RECYCLING IS NOT ENOUGH 
Secondary metals, recycled from defunct 
products, provide valuable supplementary 
resources. But secondary stock will never 
meet growing demand. And recycling has 
technical limits. 

From mobile phones to motor vehicles, 
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METAL PRODUCERS 


In 2011, most cobalt came from the 
Democratic Republic of the Congo, and most 
indium from China. Mined amounts have risen 
rapidly in the past 20 years owing to demand 
from technologies. 


_ Others 10 
Others 15% — ets 20% 


Zambia 4% + Korea 11% 
China 4% 
Canada 5% —Japan 11% 
Canada 11% 
Democratic 
Republic of 
the Congo 
712% China 57% 
Cobalt Indium 


technology metals are used in myriad applica- 
tions. Up to 60 different elements go into the 
manufacture of microprocessors and circuit 
boards'®, usually in tiny quantities and often 
in combinations that are not found in nature. 
Whether a metal can be recovered once a 
device is defunct depends on the element’s 
value, concentration and accessibility when 
it is combined with other materials’. Precious 
metals — platinum-group metals and gold — 
are the main target in the processing of used 
circuit boards. Lower-value copper, antimony 
and indium can be recovered at the same 
time. But metals such as tantalum, gallium, 
germanium and rare-earth elements are oxi- 
dized and effectively lost in the smelter slag’. 
Recycling technology metals is most eco- 
nomically attractive when they are highly 
concentrated, for example in manufacturing 
scrap. Around 70% of the indium used in pro- 
ducing flat-screen displays, for example, finds 
its way into scrap, which is then recycled’. 
To fix bottlenecks and inefficiencies 
requires measuring technology-metal stocks 
and understanding how they flow through 
the whole supply chain — from mining to 
concentration, extractive and process metal- 
lurgy, manufacturing, use, reuse, recycling, 
dispersal and disposal’. For instance, improv- 
ing recovery technology at tungsten mines 
would increase the amount of the metal in 
the ore that ends up reaching the smelter (just 
75% for tungsten, in contrast to 90% for gold). 
In theory, more than 90% of platinum- 
group metals used in autocatalysts can 
be recovered. In practice, only 50-60% is 
retrieved from European scrap cars because 
many vehicles are exported second-hand to 


© 2013 Macmillan Publishers Limited. All rights reserved 


places that lack recycling facilities. Analysis 
of metal flows could show whether a scheme 
to retrieve lost catalytic converters would be 
more effective than another type of scheme, 
such as a 2011 proposal by a UK waste-man- 
agement company to recover these metals 
from road sweepings. Autocatalysts contain 
about 0.2% platinum-group metals; sweep- 
ings contain less than 1 part per million”. 

Addressing technological barriers to 
resource efficiency in this way is a focus of 
initiatives, such as the European Innovation 
Partnership on Raw Materials — a network of 
European countries aimed at increasing the 
availability of raw materials across the region. 

However, mapping the life cycle of criti- 
cal metals is challenging. The volumes are 
low; extraction, processing and recycling 
are handled by just a few organizations; and 
commercial confidentiality can make data 
and contacts hard to find. 


ONE SYSTEM 

In the past five years, concerns over securing 
supplies of technology metals have evolved 
from near-panic over physical depletion 
and Chinese geopolitical muscle-flexing, 
toa dangerous assumption by some policy- 
makers that recycling is the panacea. A more 
holistic approach is needed. 

Primary and secondary sources must be 
considered as part of one system that needs 
to be wholly understood. Basic statistical data 
are crucial. Better dialogue between produc- 
ers, processors, consumers and recyclers will 
be needed. Policy-makers must assess how 
technology metals are used and combined, 
and the impact this has on the economic and 
environmental viability of recycling them. 

The benefits of securing supplies of tech- 
nology metals are clear. Improving the effi- 
ciency and reducing the environmental 
footprint of extraction and processing of 
these metals from primary sources is a major 
opportunity for industry and researchers. m 


Andrew Bloodworth is science director for 
minerals and waste at the British Geological 
Survey, Nottingham, UK. 

e-mail: ajbl@bgs.ac.uk 
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Prince Igor 
Metropolitan Opera House, New York 
Premiere 6 February 


The cosmic quest to comprehend the 
Universe has provided rich metaphorical 
pickings for drama, as Bertolt Brecht’s play 
Life of Galileo and Philip Glass’s opera Kepler 
attest. Nineteenth-century Russian composer, 
chemist and physician Alexander Borodin 
also looked to the skies — specifically, a solar 
eclipse — for inspiration. The result is the 
opera Prince Igor, based on the epic story 

of the twelfth-century monarch. Playing 

at New York’s Metropolitan Opera House 

for the first time in almost a century, the 
production is directed by Dmitri Tcherniakov 
and conducted by Gianandrea Noseda. IIdar 
Abdrazakov stars in the title role. 


Fail 

Dublin Science Gallery 

7 February — 27 April 

Should we do more to celebrate failure? 
Contributors to this exhibition at the Dublin 
Science Gallery would like us to. Curated by 
Jane Ni Dhulchaointigh, the Irish inventor 
of the silicone rubber Sugru — used for 
everything from fixing to sculpting — the 
show features 21 objects selected by 
household names in fields including science 
and engineering. The items personify 

the failure survived or exploited by these 
individuals on their way to success. Be 
prepared to re-examine flops and losing 
streaks and, as Irish playwright Samuel 
Beckett had it, to learn how to “fail better”. 


Britain: One Million Years of the 

Human Story 

Natural History Museum, London 

13 February — 28 September 

About 900,000 years ago, mammoths 
lumbered through what is now Kensington 
in central London — along with the first 
early humans to reach Britain. A portal to 
that distant world opens in mid-February at 


: - ‘ : 
“- a> o By F ¥y = < the Natural History Museum. The exhibition 
i 2: as Le jes >a ME showcases landmark findings from the 


>. F hf oe Ancient Human Occupation of Britain project, 
Inspired by a solar eclipse: Prince Igor, coming to the New York Metropolitan Opera in February. a collaboration between palaeontologists, 
archaeologists and geologists to craft a 
calendar of early human activity across the 


isles. Bringing together objects such as the 


bs world’s oldest wooden spear and the skull 
O l( Cc S O | of Britain’s earliest known Neanderthal, the 
show will reveal a bigger picture — waves 


of humans arriving over the tumultuous 


It promises to be a heady year for science in culture: oe ek is 
fans can steep in the sumptuous world of colour, 

unpeel the upside of failure, explore neural pathways, Posbca Aa ealet er or 
revisit the First World War, mend a rip in space-time, 5 April 2014-5 January 2015 

go pterosaur-spotting and traverse a mammoth- eet eee eee 
ridden nation. Daniel Cressey investigates : a vast gallery in a temporary exhibition at 
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MICAELA ROSSATO/METROPOLITAN OPERA 


AMNH 


FOSTER + PARTNERS 


the American Museum 

of Natural History. 
Pterosaurs, the flying 
reptiles that dominated the airspace from 
220 million to 65 million years ago, will 

be explored through fossils, models and 
interactive displays, and contrasted with bats, 
birds and other animals that have evolved 
the remarkable ability to fly. This family- 
oriented show will celebrate these animals, 
which had wingspans that could — in the 
case of Quetzalcoatlus northropi — exceed 
10 metres. 


Transcendence 

Director Wally Pfister 

Opens 18 April 

The idea of the ‘technological singularity’ 
has been knocking around for decades, 
envisioned by mathematician John von 
Neumann and futurists including Ray 
Kurzweil as the moment when advances 
in artificial intelligence tip humanity into a 
radical new mode of being. In this much- 
anticipated science-fiction blockbuster 


First World War Galleries 
Imperial War Museum London 
Opening July 


Pterosaurs such as Tupandactylus ruled 
the skies in the Cretaceous period. 


directed by Wally 
Pfister, that techno- 
epiphany reportedly arrives 
when a computer scientist 
uploads the brain of her assassinated 
husband, an artificial-intelligence researcher, 
into a computer. Does this brave new 
consciousness herald utopia or dystopia? 
Rebecca Hall, Johnny Depp and Morgan 
Freeman star. 


Colour 

National Gallery, London 

18 June - 7 September 

From ochre to neon optics, colour has 
obsessed visual artists from prehistory on, 
although its maintenance has troubled 
conservators since at least the nineteenth 
century. In the National Gallery’s 700-year 
overview of hue in paintings, glass, textiles 
and ceramics — which includes substantial 
input from the gallery’s groundbreaking 
science department — the experimentation 
of colourists from the early Renaissance 

to the Impressionist era forms the base 
layer. The show explores the production of 
pigments, from the grinding of minerals to the 
formulation of acrylic polymers, as well as 
the challenges in rendering colours. 

The trade routes that brought pigments 


A simulation of how the Imperial War Museum London’s atrium will look when it opens in July. 


To coincide with a huge programme of events commemorating the 100th anniversary of the 
start of the First World War, the Imperial War Museum London is set to open these themed 
galleries. Through interactive digital displays, audio and objects, visitors will explore the 
rapid escalation in industrial production that ensured that troops were fed and armed. The 
galleries will also depict a soldier’s daily life, from psychological trauma to grappling with 
military technologies such as tanks and aeroplanes. The museum’s refurbished atrium will 
display big hardware including a V2 rocket, Spitfire plane and T34 tank. 
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from caravan to canvas provide a fascinating 
historical context. 


The Valley of Astonishment 
Young Vic, London 
20 June - 12 July 2014 


Theatrical legend Peter Brook has long 

been inspired by the wilder shores of 
neurology and mystical Islam. In The Valley of 
Astonishment, Brook and co-director Marie- 
Héléne Estienne mix the two. True stories of 
people with synaesthesia — a neurological 
condition in which the senses are mixed, so 
colours might be tasted or heard — are woven 


into elements from the Sufi poet Attar’s 
sublime twelfth-century epic The Conference 
of the Birds, from which the play’s title derives. 
The formidable cast includes Kathryn Hunter 
and Marcello Magni. 


From Atlantis to Today: Person, Nature, 
Catastrophes 

Reiss-Engelhorn Museum, Mannheim, Germany 
7 September 2014-1 March 2015 


Why do we mythologize catastrophes? This 
major exhibition takes as its unusual theme 
how different cultures have responded to 
natural disasters from antiquity until the 
present day. Simulations allow visitors to 
experience the sensations of the stranded as, 
for instance, Hurricane Katrina hammered 
and flooded New Orleans in 2005. And 
hundreds of objects related to disaster are 
on display, including a statue of Roman 
emperor Titus Augustus, who helped 
Pompeii’s survivors after the AD 79 eruption 
of Vesuvius buried the town, and was then 
condemned by the populace for supposedly 
triggering the disaster. 


Grand reopening 
Wellcome Collection, London 
October 


After a £17.5-million (US$29-million) 
expansion and refurbishment, London’s 
Wellcome Collection — a showcase for the 
links between medicine, art and daily life — 
will emerge radically recast and with several 
new spaces. A thematic gallery will host 
long-term exhibitions — the first takingon > 
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> the pioneers of sex research — and the 
current gallery will be spruced up before 
relaunching in October with a show on 
forensics. The glorious reading room will 
be made open to all as a place in which 
objects gathered by medical collector 
extraordinaire Henry Wellcome keep 
company with rare books, art and more. 


Interstellar 
Director Christopher Nolan 
Opens 7 November 


In a future near you, societal order has 
collapsed and the remnants of NASA are 
cobbled together to investigate a tear 

in the fabric of the Universe. /nterstellar, 
already touted as one of the big films of 
2014, will be a long-awaited cinematic 
outing for the ideas of theoretical 
physicist Kip Thorne, who advised on 

the venture. Breaking away from the 

idea that space exploration is limited to 
the Solar System, Thorne plays with the 
possibility of time travel using wormholes 
— ‘warps’ in space-time that serve as 
shortcuts to distant parts of the Universe. 
Christopher Nolan, who bent minds with 
2010's heist-within-a-dream-within-a- 
dream thriller /nception, directs. 


Russia’s Space Quest 

Science Museum, London 

Autumn 2014 

In 2014, Russia and Britain celebrate a 
joint year of culture, and the programme’s 
flagship event will be this showing of a 
remarkable collection of Soviet space 
artefacts. Visitors will be able to savour 
the sight of the capsules that carried 
cosmonauts aloft and the rocket engines 
that powered them, alongside smaller 
items from personal memorabilia to 
spacesuits. A collaboration with Moscow’s 
Memorial Museum of Cosmonautics and 


Soviet propaganda celebrated the first 
human trip into space, by Yuri Gagarin. 


the Russian space agency, Roscosmos, 
the show represents the most significant 
collection of such items ever permitted to 
leave Russia. 


Daniel Cressey is a reporter for Nature 
in London. Additional reporting by 
Alison Abbott and Barbara Kiser. 
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Chasing universes 


Andrew Liddle contemplates an accomplished 
explication of the multiverse. 


aving trouble understanding the 
H Universe? Try this instead: imagine 
10°” possible universes, all different, 
and consider our place within this ensemble. 
Not randomly chosen, because our location 
should satisfy some basic conditions, such as 
habitability for intelligent species able to ask 
about their place in the cosmos. Can such a 
multiverse help us to fathom our Universe? 
Cosmologist Max Tegmark has written 
an engaging and accessible book, Our Math- 
ematical Universe, that grapples with this 
multiverse scenario. He aims initially at the 
scientifically literate public, but seeks to take 
us to — and, indeed, beyond — the frontiers 
of accepted knowledge. His explication of 
these ideas is more ambitious and individu- 
alistic than books on this topic by Leonard 
Susskind and Alex Vilenkin, for instance. 
Multiverse theory stands in stark opposi- 
tion to the belief that there should be some 
reason, perhaps a Theory of Everything, that 
determines physical laws such as the types 
of particle that exist 


> NATURE.COM and the ways in which 
For more on the they interact. In the 
multiverse, see: multiverse picture, 
go.nature.com/mqc2jd it is all an accident. 
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Our Mathematical Universe: My Quest 
for the Ultimate Nature of Reality 

MAX TEGMARK 

Knopf: 2014. 


What we knowas ‘constants’ of nature, such 
as the strength of gravity or the proton-to- 
neutron mass ratio, happen to have particu- 
lar values here, but in distant regions beyond 
our sight they may take other values and pro- 
duce universes with very different properties 
— perhaps an absence of complex atoms and 
molecules, and hence of life. 

Once seen as a fringe interest of dubious 
scientific validity, the multiverse has devel- 
oped a serious following. Steven Weinberg 
used it in 1987 to predict that our observable 
Universe ought to have a non-zero cosmo- 
logical constant, probably of a magnitude 
great enough to accommodate the accelera- 
tion of the Universe's expansion. To every- 
one’s surprise, this was verified a decade later 
through observations of distant supernovae 
by two teams of astronomers. Those who led 
the work, Saul Perlmutter, Adam Riess and 
Brian Schmidt, won the 2011 Nobel Prize 
in Physics. Subsequently, string theory and 
inflationary cosmology were recognized as 


MARK GARLICK/SPL 


providing a setting that could predict, or at 
least motivate, the existence of a multiverse. 
Tegmark’s book captures two trends in 
contemporary science writing: scientific 
autobiography and the popular book as 
manifesto, expressing a body of personal 
scientific ideas ill-suited to traditional aca- 
demic journals. Accordingly, Tegmark inter- 
weaves the science with stories of personal 
contributions to the endeavour. Fortunately, 
he is an engaging host. Tegmark makes his 
manifesto explicit by chopping his research 
life into two parts. Around a quarter of the 
book covers the ‘sensible’ work on constrain- 
ing cosmological models from data. The rest 
is the outlandish part on quantum realities 
and multiple universes — even including 
an e-mail from a (sadly unnamed) senior 
academic advising him to desist before he 
destroys his career. It is clear where Tegmark’s 
priorities lie: this book is his statement on the 
multiverse as a valid model for reality. 
Tegmark likes the multiverse so much that 
he doesnt settle for just one; instead, he offers 
four different levels of multiverse. In the first, 
we simply have our own Universe, with its 
physical laws, extending forever. Shockingly, 
this is sufficient to ensure that, somewhere far 
away, there are exact replicas of you reading 
this review, on exact replicas of Earth. It might 
even be enough to imply that you are more 
likely to exist within a simulation of reality 
than in reality itself (whatever that means). 
In the second incarnation, perhaps the 
most popular among proponents, physical 
laws vary within the multiverse so that distant 
regions can be considered to be distinct uni- 
verses. This version is necessary to explain, 
for instance, the cosmological constant and 
other apparent coincidences in physical laws 
such as the stability of neutrons within nuclei. 
In the third level, the parallel universes may 
exist only as quantum mechanical states. 
The culmination that Tegmark seeks to 
lead us to is the “Level IV multiverse”. This 
level contends that the Universe is not just 
well described by mathematics, but, in fact, 
is mathematics. All possible mathematical 
structures have a physical existence, and 
collectively, give a multiverse that subsumes 
all others. Here, Tegmark is taking us well 
beyond accepted viewpoints, advocating his 
personal vision for explaining the Universe. 
This is a valuable book, written in a decep- 
tively simple style but not afraid to make 
significant demands on its readers, especially 
once the multiverse level gets turned up to 
four. It is impressive how far Tegmark can 
carry you until, like a cartoon character run- 
ning off cliff, you wonder whether there is 
anything holding you up. m 


Andrew Liddle is a theoretical cosmologist 
at the Institute for Astronomy, University of 
Edinburgh, UK. 

e-mail: arl@roe.ac.uk 


Books in brief 


The Accidental Universe: The World You Thought You Knew 

Alan Lightman PANTHEON (2014) 

Theoretical physicist Alan Lightman’s meditation on how recent 
findings in science shape our concept of self and Universe unfolds 
with the mesmeric calm of a vessel in space. That is, until he treats 
us to some split-second encounter with a sliver of the totality — such 
as the eye of a flying osprey. In seven elegant essays on aspects 

of the Universe, Lightman takes us from the idea of an accidental 
cosmos, prompted by multiverse theory, to the Higgs boson, digital 
disembodiment and the cosmic evanescence that fits so poorly with 
our longing for permanence. 


The Science of Cheese 

Michael Tunick OXFORD UNIVERSITY PRESS (2013) 

From “smear-ripened” Swiss tilsit to the maggot-riddled casu marzu 
of Italy, cheese can carry a whiff of the surreal. Chemist Michael 
Tunick tours a sample of the 2,000 known varieties, mingling science 
(biology, chemistry, physics, nutrition and climatology) and cultural 
lore to make an accessible whole. If you have ever wondered what 
links Limburger with foot perspiration (answer: short-chain fatty 
acids), or how to make mozzarella at home, Tunick is your man. And 
the world’s most expensive cheese? Made from moose milk on a 
Swedish farm, it will set you back US$1,000 a kilogram. 


a Smarter: The New Science of Building Brain Power 
Dan Hurley HUDSON STREET PRESS (2013) 
Is “fluid” intelligence — the ability to think on your feet and discern 
> patterns — teachable? In this trip through the findings on and 
Smartep controversies around brain training, science journalist Dan Hurley 
i" proves an able, often caustically humorous guide. He starts with 
Ming yey of 2008 research on working-memory training by psychologists 
Jan Hurley Susanne Jaeggi and Martin Buschkuehl, then trawls research in 
: areas such as gaming and visual attention processing. After taking 
S the pulse at science conferences and turning guinea pig to test a 
range of techniques, Hurley admits to cautious optimism. 


Walden’s Shore: Henry David Thoreau and Nineteenth-Century 
Science 

Robert M. Thorson HARVARD UNIVERSITY PRESS (2013) 

In his 1854 masterpiece Walden, the US writer and naturalist 
Henry David Thoreau invites us to “wedge our feet downward ... till 
we come to a hard bottom and rocks in place, which we can call 
reality”. Geologist Robert Thorson obliges, focusing on Thoreau as 
a flinty amateur geologist to reinject science into his literary legacy. 
Thoreau, Thorson persuasively argues, was as grounded in rock as 
he was in the elemental understanding of the cosmos sought by the 
Transcendentalist movement. 


The Monkey’s Voyage: How Improbable Journeys Shaped the 
History of Life 

Alan de Queiroz Basic Books (2014) 

Biogeography is undergoing a sea change, argues Alan de Queiroz. 
The dominant theory of global species dispersal previously centred 

on the break-up of the supercontinent Gondwana, starting some 

160 million years ago. Now, the idea of species traversing oceans is 
gaining ground. Perhaps the most compelling scenario is the ‘monkey 
transfer’ from Africa to South America, envisioned as a simian troop 
hitching a ride on an uprooted, floating tree. Barbara Kiser 
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Targeted MOOC 
captivates students 


In our experience, massive 
open online courses (MOOCs) 
can reach non-traditional 

and disadvantaged learners 

if they address a recognized 
need, support the educational 
requirements of the intended 
cohort and enable learning with 
tangible outcomes (see 

E. J. Emanuel et al. Nature 503, 
342; 2013). 

The Wicking Dementia 
Research and Education 
Centre at Australia’s University 
of Tasmania has developed 
a MOOC on understanding 
dementia. This attracted almost 
10,000 people from more than 60 
countries in July, of which 89% 
were women, 70% were over the 
age of 40, and only 17% were 
educated beyond a bachelor’s 
degree (compared with 44% 
reported by Emanuel and 
colleagues). 

The course is tailored to the 
educational needs of the care 
workforce and family-based 
carers who support the more 
than 44 million people with 
dementia worldwide. 

Our cohort-centric approach 
involved structuring the course to 
support non-traditional learners, 
including providing online 
technical and teaching support. 


> 


NATURE’S 
READERS 
COMMENT 
ONLINE 


A sample of responses 
from the debate on the 
reproducibility drive 
(M. Bissell Nature 
503, 333-334; 2013). 


The completion rate for 
the 11-week course was 39%, 
which is considerably better 
than the international average 
for MOOCs (see C. Parr Times 
Higher Education 9 May 2013; 
go.nature.com/p25g67). 
Carolyn King, Andrew 
Robinson, James Vickers 
Wicking Dementia Research and 
Education Centre, University of 
Tasmania, Hobart, Australia. 
carolyn. king@utas.edu.au 


Storm-surge impact 
depends on setting 


A storm surge on 5-6 December 
threatened urban centres and 
rural communities around 
the southern North Sea ina 
similar way to such an event 
60 years ago. Causing more 
than 2,000 deaths, the 1953 
flood was western Europe’s 
most devastating in 100 years in 
terms of loss of life. Last month, 
however, a disaster was averted 
by advances in storm-surge 
forecasting, improved defences, 
early-warning systems and 
integrated crisis management. 
Immediately after the surge, 
we made high-resolution 
measurements of maximum 
water levels, focusing on 
obvious debris lines, erosion 
points on earthen bank defences 


Nitin Gandhi says: 

The very fact that we have to 
take the issue of replication 
so seriously and spend lots of 
time and money on it in these 
hard times speaks out loudly 
that things are not right in 
biomedical research. 


William Gunn says: 

The Reproducibility Initiative 
aims to make science work 
better for everyone [see 
go.nature.com/v5cljs]. The 
worst that could happen is that 
we learn a lot about what level 
of reproducibility to expect 
and how to reliably build on 

a published finding. At best, 
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and water marks on buildings 
along the 45-kilometre-long 
northern coastline of Norfolk 
in the United Kingdom. These 
confirm that flood levels 

were similar to, and in places 
exceeded, those in the 1953 
disaster. 

There was considerable 
variation in the mean height 
of peak water levels along the 
shore (the maximum difference 
between measurement stations 
was more than 1.2 metres). 
This reflects the combined 
effects of tide, surge and wave 
run-up, which has a strong local 
component. For this coastline 
of barrier islands, spits and tidal 
embayments, these observations 
indicate that the coastal 
setting and extent of coastal 
ecosystems (such as mudflats 
and salt marshes) are critical 
in determining the pattern of 
storm-surge impacts. 

Such differences become 
crucial when properties, 
infrastructure and lives are 
threatened by sea flooding (see 
also J. D. Woodruff et al. Nature 
504, 44-52; 2013). These factors 
should be incorporated into 
hydrodynamic modelling and 
forecasting efforts, to help fine- 
tune early-warning systems and 
evacuation planning. 

Thomas Spencer University of 
Cambridge, UK. 


funders will start tacking a 
few per cent on to grants for 
replication purposes and 
publishers will start asking for 
it. That can only be good for 
science as a whole. 


Anonymous says: 

| would be a rich man if | had 
received a penny for every 
time | heard the expression 
“in our hands” at a scientific 
lecture. | disagree that “the 
push to replicate findings 
could shelve promising 
research and unfairly damage 
the reputations of careful, 
meticulous scientists”. | believe 
that the opposite is true. 
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ts111@cam.ac.uk 

Susan M. Brooks Birkbeck, 
University of London, UK. 

Iris M6ller Cambridge Coastal 
Research Unit, UK. 


Whistle-blowers 
have a tough time 


Whistle-blower cases that go on 
forever are not uncommon (see 
Nature 503, 454-457; 2013). 
The cold conclusion is that the 
whistle-blower may survive, but 
the odds are against him or her. 

I have worked with whistle- 
blowers for more than 35 
years as an expert witness in 
court cases and as author of 
the forthcoming book Don’t 
Kill the Messenger (see www. 
whistleblowing.us), and find 
that they are hard to silence. The 
truth-telling part of their brain 
seems to override the health and 
safety part, so they will endure 
all forms of retaliation for the 
sake of truth. 

Institutions can also be very 
slow to admit to any mistakes on 
their watch. This factor delays 
adjudication and makes it harder 
for the whistle-blower to prove 
anything in court. 

Don Soeken Whistleblower 
Support Fund, Ellicott City, 
Maryland, USA. 
helpline@tidalwave.net 


Scientists should be 
encouraged to report and 
publish when they fail 
to replicate each other’s 
experiments. That will help 
science (but maybe not 
scientific careers) progress 
much faster. 


lrakli Loladze says: 

The current system does not 
penalize for publishing sexy 
but non-reproducible findings. 
In fact, such publications 
boost the chances of getting 
another grant. It is about 

time to end this vicious cycle 
that benefits a few but hurts 
science at large. 


KEYSTONE/EYEVINE 


OBITUARY 


Frederick Sanger 


(1918-2013) 


Double Nobel-prizewinning genomics pioneer. 


was one of just four scientists to win two 

Nobel prizes and the only one to receive 
both in chemistry. Both were awarded for the 
invention of methods to determine the order 
of the biological building blocks of life. 

Sanger will be remembered especially for 
developing techniques to read out the As, Cs, 
Gs and Ts ina strand of DNA. This work pro- 
vided the means to decipher genetic material 
and led to his second prize, which he shared 
with Paul Berg and Walter Gilbert in 1980. In 
the 1990s, Sanger’s eponymous method was 
used by laboratories around the world to work 
out the sequence of the human genome. 

His first prize came in 1958 for his discov- 
ery of how amino acids are strung together 
in the protein insulin. In the 1950s, many 
thought that the amino acids within a pro- 
tein were arranged randomly, but Sanger 
proved beyond doubt that they instead form 
a unique sequence. Although he made light 
of this conclusion, saying that those who 
knew about proteins expected this outcome, 
the knowledge that proteins had a precise 
sequence suggested that this information 
must be codified in DNA. 

Sanger, who died in Cambridge, UK, on 
19 November aged 95, was born in 1918 
in Gloucestershire. Raised as a Quaker, he 
learned self-reliance and practical manual 
skills as a schoolboy. These aptitudes were 
used to great effect in his laboratory and in 
building sailing boats. 

He developed an interest in science from 
his physician father and his older brother, 
with whom he enjoyed the outdoors. In 
1939, he graduated in biochemistry from 
St John’s College, Cambridge. A consci- 
entious objector, he stayed on at the Uni- 
versity of Cambridge during the Second 
World War to study the nutritional ben- 
efit of lysine in potatoes under biochemist 
Albert Neuberger. In 1940, Sanger married 
Margaret Joan Howe, an economics gradu- 
ate. They had three children and remained 
married until her death in 2012. Sanger 
ascribed his wife and his fellow researchers 
key roles in his success. 

After receiving his PhD in 1943, Sanger 
began the research that led to his first Nobel 
prize, working out how amino acids link up 
in the two polypeptide chains of insulin. 
He labelled the ends of the separate chains 
with a yellow dye, then hydrolysed them 
to amino acids and identified the tagged 
amino acid in each case. After using acid 


3 rederick Sanger, ‘the father of genomics, 


and enzymes to split each chain into defined 
fragments, he tagged purified fragments 
with the dye and repeated the process. From 
this, and from the amino-acid composition 
of the fragments, he deduced the order of 


amino acids in the intact protein, rather like 
building up a picture from the pieces of a 
jigsaw puzzle. 

Sanger preferred to be in the background 
but was not afraid to use his clout. He sup- 
ported a successful bid to the UK Medical 
Research Council (MRC) to build the Labo- 
ratory of Molecular Biology in Cambridge, 
which opened in 1962. Here, Sanger spent 
the rest of his active scientific life. 

After first working out ways to sequence 
RNA molecules, by which sequence infor- 
mation in genes is transferred into the 
sequences of proteins, Sanger took up the 
challenge of sequencing genes themselves. 
He developed a method that used enzymes 
to copy fragments of DNA. Four reactions 
were set up side by side, each supplied with 
the four standard building blocks, or nucleo- 
tides, (As, Cs, Gs and Ts), one of which was 
labelled with radioactive atoms. Each reac- 
tion also contained a modified version of 
A, C, Gor T. Unlike standard nucleotides, 
these ‘chain terminators’ did not allow the 
DNA strand to grow further after they had 
been incorporated. Interrupted copies were 
separated according to their size on gels by 
an electric current and exposed to photo- 
graphic film, allowing the radioactivity to 
produce the now-iconic ‘ladders’ of dark 
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bands. These bands revealed the length of 
the DNA copy and allowed the sequence to 
be read simply. By combining sequences of 
many DNA fragments, the sequence of the 
larger DNA molecule from which the frag- 
ments were derived could be deduced. 

Sanger demonstrated the power of his 
method by sequencing genomes of ever- 
increasing size, starting with a simple bac- 
terial virus (5,386 nucleotides) in 1977, 
then the DNA in the mitochondria of 
human cells (16,569 nucleotides) in 1981 
and, finally, the genome of a complex bac- 
terial virus, bacteriophage lambda (48,502 
nucleotides), in 1982. 

In 1993, nine years after Sanger retired, 
the Wellcome Trust and the MRC opened 
the Sanger Centre (now the Wellcome Trust 
Sanger Institute) near Cambridge, where a 
considerable part of the human genome was 
decoded with the technique he developed. In 
the 2000s, Sanger sequencing gradually gave 
way to faster, cheaper techniques that detect 
nucleotides as they attach to growing DNA 
strands. But Sanger sequencing remains the 
gold standard. The highly accurate tech- 
nique is increasingly being applied to the 
genomes of individual humans and even 
individual cells within tumours. Sanger’s 
impact on biology is as dramatic as that of 
Charles Darwin. 

Sanger was happiest at the laboratory 
bench, where he worked tirelessly and single- 
mindedly. He performed elegant experiments 
with simple apparatus to solve extremely 
difficult problems. In so doing, he inspired 
younger scientists and attracted some of the 
best biologists in the world to Cambridge. 

Sanger was famously understated, but he 
knew that he was an extraordinary scientist, 
and when the occasion demanded it he was 
prepared to say so. When colleagues assem- 
bled after the announcement of his second 
Nobel Prize, one praised his characteristic 
modesty. Sanger responded: “I want you all 
to know that I think that Iam bloody good.” 
He was showered with awards and quietly 
enjoyed the recognition. After retirement, 
he continued to build boats and developed a 
magnificent English garden. m 


John Walker received the Nobel Prize in 
Chemistry in 1997. From 1974 to 1984, he 
worked alongside Frederick Sanger at the 
Medical Research Council Laboratory of 
Molecular Biology. 

e-mail: walker@mrc-mbu.cam.ac.uk 
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Resistance nailed 


Aseries of in vitro, genomic, ecological and epidemiological studies has pinpointed gene mutations in the malaria parasite 
Plasmodium falciparum that play a key part in resistance to artemisinin- based antimalarial drugs. SEE ARTICLE P.50 


CHRISTOPHER V. PLOWE 


fragile consensus that the global 

eradication of malaria is possi- 

ble, and prospects for eventually 
achieving this audacious goal, are being 
threatened by the emergence in south- 
east Asia of parasite resistance to the 
drug artemisinin and its derivatives. On 
page 50 of this issue, Ariey et al.' report 
welcome news: they have identified a 
molecular marker of artemisinin- 
resistant malaria that can be used to 
map resistance and guide efforts to 
eliminate it. 

Artemisinin-based combina- 
tion treatments have contributed 
to reductions in the global burden of 
malaria, prompting the Bill & Melinda 
Gates Foundation and the World Health 
Organization to issue a call in 2007 for 
an international push towards malaria 
eradication’. Artemisinin drugs nor- 
mally clear malaria parasites from 
the blood of a patient within two days 
of starting treatment; now, however, 
increasing numbers of Plasmodium falci- 
parum infections in western Cambodia, 
southern Vietnam, eastern Myanmar 
and western Thailand take up to five 
days to clear. In some areas, artemisinin- 
based combination therapies are start- 
ing to fail completely, with persistence 
of both infection and clinical illness after what 
should be curative treatment. 

Efforts to contain artemisinin resistance in 
southeast Asia and to eliminate malaria would 
be aided enormously by the identification of 
a molecular marker for this drug resistance. 
Such markers are available for resistance to 
other antimalarial drugs for which the genetic 
determinants of resistance in the parasite are 
known. However, neither the mechanism of 
artemisinin action nor the mechanism(s) of 
resistance are understood. Examinations of 
the P falciparum genome for regions of recent 
strong evolutionary selection, and targeted 
and genome-wide association studies, have 
implicated two adjacent regions on chro- 
mosome 13 as potential sites of a resistance- 
determining gene or genes**. Through dogged 
determination anda remarkable combination 


Figure 1 | Propeller mutations. Ariey et al.’ show that mutations 

in a Plasmodium falciparum gene that encodes the kelch protein 

K13 are associated with both in vitro and clinical measures of 
artemisinin-resistant P falciparum malaria in Cambodia. The 
mutations (orange spheres) encode amino-acid changes in the 
‘propeller blades’ of this protein, which resembles a child’s pinwheel 
and is thought to be involved in various protein-protein interactions. 


of approaches, Ariey and colleagues seem to 
have won the race to identify if not the gene, 
at the very least a key gene, responsible for 
artemisinin resistance. 

In what seemed like a long shot, the 
researchers laboriously grew an artemisinin- 
sensitive parasite isolated from a Tanzanian 
individual in culture for five years, exposing 
it intermittently to artemisinin. The drug 
was removed when the parasites’ growth 
faltered, and replaced after they bounced 
back. After 60 cycles of drug pressure, the 
proportion of parasites surviving a pulse of 
artemisinin had increased from less than 0.01% 
to more than 10%. Genome sequencing of this 
population revealed eight single-nucleotide 
mutations (single nucleotide polymorphisms, 
or SNPs) in seven genes that were present 
in the resistant parasites but not in their 
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siblings grown in a parallel culture 
without drug exposure. 

The suspects were cornered. The 
culprit was then identified when the 
authors looked for these candidate 
mutations in parasite lines from Cam- 
bodia that had varying susceptibilities 

to artemisinin drugs in vitro’. After 

ruling out candidate genes from the 

Tanzanian isolate that showed no 
sequence variation in the Cambodian 
isolates, and genes containing SNPs 
that were not associated with in vitro 
resistance, a single gene remained 
with resistance-associated SNPs. The 
gene is located on chromosome 13, 
within a candidate region identi- 
fied in a recent genome-wide asso- 
ciation study of clinical artemisinin 
resistance’. 

The gene in question encodes 
a kelch protein called K13. Kelch 
proteins are involved in a variety of 
protein-protein interactions, and 
contain several regions of repeating 
amino-acid sequences, each forming a 
‘propeller blade’ (Fig. 1). Further evi- 
dence for the central role of K13-pro- 
peller SNPs in resistance was provided 
by Ariey and colleagues’ ecological sur- 
vey of malaria parasites from several 
Cambodian provinces. K13-propeller 
mutations were rare or absent in sam- 
ples from provinces with virtually no docu- 
mented resistance, but were widespread in 
provinces where resistance has been reported. 
Moreover, their prevalence in these provinces 
has increased during the past decade, contem- 
poraneously with increases in the prevalence 
of resistance. 

The authors went on to show that K13 SNPs 
were also strongly correlated with delayed 
parasite clearance following artemisinin 
treatment in clinical trials. And a final piece 
of evidence came from subpopulations of 
Cambodian parasites that can be segregated 
into sensitive and resistant groups’. The prev- 
alence of K13 SNPs not only correlated with 
resistance among these subpopulations, they 
actually did a better job of explaining resist- 
ance than did population groupings: sensi- 
tive parasites assigned on the basis of their 


genomic profile to ‘resistant’ subpopulations 
had wild-type K13, and resistant parasites 
belonging to ‘sensitive’ subpopulations carried 
the SNPs. 

Definitive proof that K13-propeller muta- 
tions confer artemisinin resistance will come 
from genetic transformation of drug-sensitive 
parasites into resistant ones by the replacement 
of their wild-type K13 gene with a mutated 
gene. It is of course possible — even probable 
— that other genes contribute to artemisinin 
resistance, but this study leaves little doubt 
that K13 is a major determinant of resistance 
to these drugs in P. falciparum malaria. Fur- 
ther study of gene variants in the chromo- 
somal vicinity of the K13 gene will ascertain 
whether resistance arose once in western 
Cambodia and then spread — in which case 
the genomic regions containing the K13 SNPs 
will show extended surrounding sequence 
similarity, indicating common ancestry — or 
whether it emerged independently in different 


EXTRASOLAR PLANETS 


geographical locales. If resistance has arisen 
independently in many areas, local contain- 
ment efforts will be futile and only regional 
elimination offers any hope of preventing its 
spread to Africa, where the arrival of drug- 
resistant Asian parasites has previously led to 
marked increases in malaria hospitalizations 
and deaths’. 

Validation of this molecular marker of arte- 
misinin resistance outside Cambodia will be 
easily achieved, and mapping of the marker 
throughout southeast Asia is already under 
way, thanks to early sharing of the results of 
this study with local malaria-control workers 
and researchers. With at least 17 SNPs residing 
in the propeller domains of K13, only one of 
which is found in any one parasite, sequenc- 
ing of the K13 gene will initially be necessary 
to map resistance. But if a few specific SNPs 
emerge as being predictive of resistance in dif- 
ferent settings, switching to rapid molecular 
assays using DNA extracted from dried blood 


Cloudy witha 
chance of dustballs 


The flat and featureless transmission spectra of two intermediate-sized 
extrasolar planets, observed during the planets’ passage across their host stars, 
shed light on the properties of their atmospheres. SEE LETTERS P.66 & P.69 


JULIANNE MOSES 


hi spectra do not typically excite astron- 


omers, but there are times when a lack 

of spectral features tells you something 
interesting. Such is the case with observations 
of two separate sub-Jupiter-sized extrasolar 
planets made using the Hubble Space Tele- 
scope and reported by Knutson et al.' and 
Kreidberg et al.” on pages 66 and 69 of this 
issue, respectively. 

The first planet, GJ 436b, has a mass and 
radius slightly greater than Neptune’. The sec- 
ond, GJ 1214b, is smaller, with a radius roughly 
2.7 times that of Earth. Both exoplanets orbit 
very close to their host stars and are therefore 
quite warm by Earth standards. Owing to 
intensive observational scrutiny since their 
respective discoveries, these two planets have 
become the archetypes of the new ‘Neptune- 
class’ and ‘super-Earth’ categories of exoplanets. 

Although the first extrasolar planets discov- 
ered — in the 1990s and 2000s — tended to 
be hot, massive, hydrogen-dominated worlds 
that most closely resemble Jupiter, astrono- 
mers have been methodically and efficiently 
chipping away at the harder-to-observe regime 
of smaller, denser and cooler Earth-like planets. 


Recent ground- and space-based surveys dem- 
onstrate** that planets of sizes ranging between 
those of Earth and Neptune overwhelmingly 
dominate the observed exoplanet population. 
But what are these intermediate-sized exo- 
planets really like? Our Solar System does not 
provide sufficient clues about these planets, 
because we have only Earth and Venus at one 
end of the scale and cold Uranus and Neptune 
at the other to serve as examples. Do these 
mid-sized exoplanets have rocky surfaces, like 
Earth and Venus? Are they fluid planets with 
thick, deep atmospheres relatively rich in 
hydrogen and volatile elements, like Uranus 
and Neptune? Are they ‘water worlds’ with 
steam atmospheres overlying deep oceans? Are 
the atmospheres of these planets thick or thin, 
consisting predominantly of hydrogen, water, 
carbon dioxide or nitrogen, or something more 
exotic? Observations of GJ 436b and GJ 1214b 
provide important hints. 

The orbital planes of both GJ 436b and 
GJ 1214b are almost exactly edge-on as seen 
from Earth, so that the planets periodically 
transit — pass directly in front of — their host 
stars, causing slight dips in the amount of 
light seen from the system. The depth of these 
transit dips allows the planetary radius to be 
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spots will accelerate translation of this research 
finding into a practical tool for public-health 
surveillance. = 


Christopher V. Plowe is at the Howard 
Hughes Medical Institute and the Center for 
Vaccine Development, University of Maryland 
School of Medicine, Baltimore, Maryland 
21201, USA. 

e-mail: cplowe@medicine.umaryland.edu 
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determined, and their wavelength depend- 
ence provides information about atmospheric 
composition. During transits, the stellar light 
passes through the planet’s atmosphere on 
its way to the observer. For a planet with an 
extended atmosphere, the apparent size of 
the planet can vary with observed wavelength 
because more stellar light is blocked at high 
altitudes at wavelengths for which atmospheric 
constituents have strong absorption bands. 
Conversely, more light passes through at low 
altitudes at wavelengths for which atmospheric 
constituents are less absorbing. 

GJ 1214b is known to have a relatively flat 
(wavelength-independent) transmission spec- 
trum, with weak absorption features at best®”. 
Because the bulk density of the planet suggests 
that it must have a gaseous envelope, the two 
leading theories for explaining the flat trans- 
mission spectrum involve either widespread, 
high-altitude clouds or a hydrogen-poor 
atmosphere dominated by a high-molecular- 
weight constituent such as water or carbon 
dioxide (Fig. 1). In both cases, the stellar light 
at most wavelengths would be extinguished 
fairly abruptly within a small vertical region 
of the atmosphere. 

In their study, Kreidberg and colleagues 
present near-infrared transmission spectra 
for GJ 1214b that finally allow one of the two 
competing theories to be ruled out for this 
super-Earth exoplanet. The extremely pre- 
cise spectra, obtained from the Wide Field 
Camera 3 (WFC3) on board the Hubble Space 
Telescope, demonstrate that GJ 1214b’s trans- 
mission spectrum is so flat and featureless in 
a wavelength region between about 1.1 and 
1.6 micrometres that high-altitude clouds 
provide the only plausible explanation. The 
observations are precise enough that spec- 
tral features from a cloud-free atmosphere 
dominated by heavy molecules such as water, 
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Stellar photon 
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Observer 


Figure 1 | Exoplanetary atmospheres. Flat transmission spectra of exoplanets during transit, such as 
those reported by Knutson et al.' and Kreidberg et al.”, can result from a planet with an atmosphere that 
either contains high clouds (a) or that is hydrogen poor with a high mean molecular weight (b). In the 
cloudy case, photons from the planet’s host star are blocked abruptly when they encounter the cloud 
layer on their way to an observer on Earth. In the hydrogen-poor case, the high molecular weight of the 
atmosphere allows it to be bound tightly by gravity and therefore be vertically compressed, with large 
changes in density over relatively small vertical scales providing a relatively sudden absorption of all 
the stellar photons. If the planet had a clear (cloud-free), low-mean-molecular-weight atmosphere 

(not shown), atmospheric absorption features would be more prominent in the transmission spectrum. 


methane, carbon monoxide or carbon dioxide 
would have been detectable if such an atmos- 
phere were present on GJ 1214b. Even an 
atmosphere composed of 99.9% spectrally 
neutral nitrogen with 0.1% water can be 
rejected on the basis of the lack of water- 
absorption features. 

Meanwhile, new WEC3 observations of 
GJ 436b presented by Knutson and colleagues 
point to a similarly flat and featureless trans- 
mission spectrum between 1.1 and 1.6 um 
for this Neptune-class planet. Given that one 
might expect the more massive GJ 436b to 
contain more hydrogen than GJ 1214b, the flat 
spectrum is, in this case, an even bigger sur- 
prise — a hydrogen-rich atmosphere would be 
vertically extensive, and expected trace species 
such as water and methane would have promi- 
nent deep absorption bands. However, unlike 
the situation for GJ 1214b, Knutson et al. dem- 
onstrate that a hydrogen-poor atmosphere 
(with or without clouds) and a hydrogen-rich 
atmosphere with high clouds are both statisti- 
cally viable solutions to explain the observed 
flat transmission spectrum for GJ 436b. To dis- 
tinguish between these scenarios, more precise 
moderate-resolution spectral observations at 
near-infrared wavelengths will be needed to 
unambiguously reveal any spectral features. 
Longer-wavelength eclipse observations’, 
acquired when the planet passes behind the 
star, could also help to discriminate between 
the two hypotheses. 

Evidence is mounting that the hydrogen 


fraction within a planet is a strong function of 
planet size’, so itis not necessarily an ‘either-or’ 
situation for explaining the flat transmission 
spectra of GJ 436b and GJ 1214b: the atmos- 
pheres could be cloudy and have a large mean 
molecular weight. However, high-altitude 
clouds on these two exoplanets would not 
resemble the clouds we see in the Solar Sys- 
tem. Possible candidates include potassium 
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chloride or zinc sulphide ‘dust’ clouds. For the 
case ofa relatively hydrogen-poor atmosphere, 
these two components would form clouds that 
are optically thick (opaque) enough at high 
altitudes on both planets that the transmitted 
stellar light would be abruptly blocked, leading 
toa flat transmission spectrum”. Alternatively, 
thick hazes such as those seen around Saturn's 
moon Titan could be produced from photo- 
chemical processing of atmospheric gases by 
ultraviolet stellar photons, although the lack 
of evidence for methane on either of these 
two planets”® suggests that any photochemi- 
cal hazes present would be decidedly different 
from those on Titan. 

Hydrogen-poor or not, dust-shrouded or 
not, super-Earth and Neptune-class planets 
collectively represent an intriguing and pop- 
ulous type of extrasolar planet whose exotic 
atmospheres may have no true analogues in 
the Solar System. The transmission spectra 
presented here — flat and featureless, and yet 
full of information — provide one piece of the 
puzzle needed to characterize such planets. = 
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Four makes a party 


Adding the first high-quality Neanderthal sequence to genomic comparisons of 
archaic and modern humans sheds light on gene flow, population structure and 
adaptation, and suggests the existence of an unknown group. SEE ARTICLE P.43 


EWAN BIRNEY & JONATHAN K. PRITCHARD 


rchaic humans have captured the 
Ave imagination since the nine- 

teenth century, when the remains of 
Neanderthals were discovered in the Nean- 
der valley of northern Germany and else- 
where in Europe. Until recently, Neanderthals 
and other archaic humans were known only 
from bones and various artefacts, but DNA- 
sequencing technology is now providing us 
with new perspectives on these early groups 
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and their relationships to modern humans. 
In this issue, Priifer et al.' (page 43) report 
the first high-quality genome sequence of a 
Neanderthal individual. Their work adds to 
an emerging story about a tangled web of gene 
flow among modern humans and different 
early hominins (humans and archaic groups 
that are more closely related to humans than to 
chimpanzees), and hints at the existence of an 
unknown, highly diverged hominin group that 
contributed to this archaic gene pool. 
Neanderthals are thought to have persisted 


in southern Europe until around 30,000 years 
ago’, thus potentially overlapping with modern 
humans. As a result, there has long been 
interest in whether Neanderthals might have 
interbred with early Europeans. In the 1990s, 
the first comparisons of DNA sequences 
from modern humans and Neanderthals** 
suggested a rather simple story: that mod- 
ern humans emerged from Africa during 
the past 100,000 years, and spread around 
the globe without receiving genetic contri- 
butions from hominins that had left Africa 
much earlier. 

These early studies were based on sequences 
from mitochondrial DNA, which is easier than 
nuclear DNA to capture in ancient samples but 
represents only a tiny fraction of the human 
genome. However, the past few years have seen 
a revolution in our ability to obtain nuclear- 
genome sequences from ancient samples”, 
and these data are providing startling insights. 
One surprise was the first clear evidence for 
interbreeding between Neanderthals and 
modern humans’; another was the discovery of 
a second type of archaic hominin in Eurasia in 
addition to Neanderthals. This group, dubbed 
the Denisovans, is known mainly from the 
genome sequence ofa single finger bone found 
in acave in the Altai Mountains in Siberia‘®”. 

Although the Neanderthal bone from which 
Priifer et al. derived their genomic sequence 
was found in the same Siberian cave, its owner 
is estimated to have lived several thousand 
years earlier than the Denisovan individual, 
and the two populations that the individu- 
als represent are not closely related. The 
ancestors of Neanderthals and Denisovans 
diverged from the main human lineage about 
600,000 years ago, and then split from each 
other around 400,000 years ago (Priifer et al. 


discuss these estimates and associated caveats 
in detail). Thus, Neanderthals and Denisovans 
were quite distinct populations, having been 
separated for roughly three times longer than 
any modern human populations. 

Priifer and colleagues’ sequence compari- 
sons provide further detail about the extent of 
interbreeding between the different hominin 
groups living during the Pleistocene period 
(see Fig. 8 of the paper'). The authors offer 
a more confident estimate of the Neander- 
thal contribution to the genomes of modern 
humans: about 2% for non-Africans (Africans 
have no detectable Neanderthal ancestry). 
They also report gene flow from Neanderthals 
into Denisovans that includes input at func- 
tionally important genomic regions involved 
in immunity and sperm function. Earlier work 
had shown that the main Denisovan contri- 
bution to modern humans is found in some 
populations in Oceania and, to a lesser extent, 
in east Asians”. 

Most provocatively, Priifer et al. find 
evidence for modest levels of gene flow into 
Denisovans of sequence that is different from 
that of any known group, implying that there is 
at least one more, so far undiscovered, archaic- 
hominin group (Fig. 1). Low levels of gene flow 
have been observed in other radiations of spe- 
cies, so evidence for inter-hominin breeding 
should not be a tremendous surprise’; how- 
ever, it does seem that Eurasia during the Late 
Pleistocene was an interesting place to bea 
hominin, with individuals of at least four quite 
diverged groups living, meeting and occasion- 
ally having sex. 

The Neanderthal and Denisovan genomes 
also share another intriguing feature: they 
both have extremely low genetic diver- 
sity, with only about two heterozygous sites 
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Figure 1 | Gene flow from an unknown ancient population. Priifer et al.' calculate that modern 
Africans show greater genomic similarity to Neanderthals than to Denisovans. Average sequence 
divergence along the lineage leading to modern Africans is 7.47% since the last common ancestor 

with Neanderthals, and 7.71% since the last common ancestor with Denisovans (both numbers 

represent the percentage of divergence since the human-chimpanzee split). This difference is highly 
significant, and is inconsistent with a simple model in which the entire Neanderthal and Denisovan 
genomes come from the same source population. The best alternative model identified by the authors is 
that there was flow of a small contribution of genomic material (0.5%-8%) into Denisovans from a highly 


diverged, unknown population. 
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50 Years Ago 


The Continental Shelf Bill, which 
received its second reading in the 
House of Lords on December 3, 
originated in the Conference on 
the Law of Sea at Geneva in 1958, 
which resulted in the Continental 
Shelf Convention and the High Seas 
Convention. The former, which 
the Government intends to ratify 
if the Bill becomes law, clarified 
international law concerning those 
large submarine areas outside the 
territorial seas where the depth 

of the water allows the natural 
resources of the sea-bed and subsoil 
to be exploited ... In the North 

Sea ... Britain will have rights over 
any deposits up to a line half-way 
across to Holland, Belgium and 
other coastal States, subject to any 
adjustments resulting from the 
negotiations that the Government 
proposes to undertake after 
ratifying the Convention. 

From Nature 4 January 1964 


100 Years Ago 


Major H. G. Joly De Lotbiniere 

has contributed to The Quarterly 
Review for October a valuable and 
timely article on the position of 
forestry in England and abroad, 

in which he reviews the principal 
timber resources of the world, and 
the steps that have been taken in 
England and elsewhere to provide 
for the future. As he points out, 
experts in every country are agreed 
that the world’s supply of timber 

is rapidly diminishing, and that 
unless vigorous steps are taken in 
the afforestation of suitable waste 
lands a shortage of material must be 
experienced long before the close 

of the present century. The author 
indicates in a general way the lines 
on which the work of afforesting the 
sixteen million acres of mountainous 
and heath land in this country 
should be proceeded with, and urges 
the necessity for immediate action. 
From Nature 1 January 1914 
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(sequence differences between the paired 
homologous chromosomes) per 10,000 
nucleotides. This equates to only around 
one-quarter of the genetic diversity of mod- 
ern humans. The Neanderthal individual 
sequenced by Priifer et al. had reduced 
heterozygosity in part because she was 
inbred (her parents were as related as half- 
siblings). However, the authors’ analysis 
suggests that the primary cause of the low vari- 
ability is that both groups had extremely small 
effective population sizes for the preceding 
100,000 years or more. 

Not only are these diversity estimates low 
compared with the genetic diversity of mod- 
ern humans, they are also among the lowest 
levels of genetic diversity reported for any 
organism''. These small population sizes seem 
paradoxical given the large geographical range 
of Neanderthals (and perhaps also of Denis- 
ovans), but they suggest that the population 
densities of these hominins were extremely 
low. Might these archaic hominins have been 
on their way to extinction even in the absence 
of any competition they may have experienced 
from modern humans? 

The new Neanderthal genome will also 
provide insight into the evolution of modern 
humans. Priifer et al. report that there are 
just 96 protein-coding positions at which the 
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Neanderthal sequence differs from that of all 
modern humans, with around a further 35,000 
such differences at non-coding positions, some 
of which may affect gene regulation. This 
catalogue is an intriguing starting point for 
studying the functions of genetic differences 
between these groups; for example, this list 
is short enough to imagine creating cell lines 
or mouse models that contain each specific 
change. However, one must be mindful that 
many human attributes, such as bipedal gait 
and complex culture, probably evolved before 
this period of hominin diversification, and that 
additional important variants may lie in parts 
of the genome that are difficult to sequence 
using current methods. 

After years of challenges, ancient-DNA 
studies are coming into their own, but they 
are raising as many questions as they answer. 
How many distinct archaic hominin groups 
were around in the Late Pleistocene? What 
were their geographical distributions? How 
did they help to shape the genetic make-up 
of modern humans? The recent sequencing 
of a 24,000-year-old Siberian specimen” and 
the recovery of mitochondrial DNA from a 
400,000-year-old hominin” are examples of 
how each new ancient genome adds signifi- 
cantly to our understanding of both recent 
and more distant human history. We can 


Clouds of uncertainty 


An evaluation of atmospheric convective mixing and low-level clouds in climate 
models suggests that Earth’s climate will warm more than was thought in 
response to increasing levels of carbon dioxide. SEE ARTICLE P.37 


HIDEO SHIOGAMA & TOMOO OGURA 


arth is warming because of increased 
Fermeoter concentrations of green- 

house gases, including carbon dioxide, 
caused by human activities. To develop policies 
that can help to control anthropogenic inter- 
ference in climate, estimates of climate sensi- 
tivity — the mean global temperature response 
to a doubling of CO, levels — are required, and 
have been sought for decades. But despite tech- 
nical advances and the considerable efforts of 
climate scientists, the range of climate sensi- 
tivities estimated by the Intergovernmental 
Panel on Climate Change (IPCC) using com- 
puter models has not narrowed since 1990, 
and remains at roughly 1.5-4.5 °C (ref. 1). 
Low-level clouds occurring below 2-3 kilo- 
metres over the tropical ocean respond in 
various ways to a doubling of CO, in differ- 
ent models’ (Fig. 1), and so are key contribu- 
tors to the uncertainty of climate sensitivity. 
On page 37 of this issue, Sherwood et al.’ 
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present an observational test of atmospheric 
convective mixing that is relevant to low-level 
cloud responses, and they suggest that higher 
climate sensitivities are more likely than 
lower ones. 

Low-level clouds reflect incoming sunlight 
from space, and so cool the climate. If the 
amount of this cloud declines steeply as the 
climate warms, then more sunlight will reach 
the surface, an effect that contributes to higher 
climate sensitivity. By contrast, increases 
in low-level cloud result in lower climate 
sensitivity. 

Sherwood and colleagues propose a mecha- 
nism that controls changes in the amount of 
low-level cloud. They reason that, as the cli- 
mate warms, stronger mixing of water vapour 
between the low-level cloud layer and the layer 
of the atmosphere above it desiccates the low- 
level cloud layer, reducing the amount of cloud. 
To assess the effect of this in climate models, 
the authors defined and computed measures of 
mixing strength for 43 models that contributed 
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expect many more exciting stories in the 
coming years. m 
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to the IPCC’s fourth (2007) and fifth (2013) 
assessment reports. 

The researchers came up with three crucial 
findings. First, they observed that differences 
in mixing strength explained about half of the 
spread of climate sensitivities estimated by 
the models. Second, they found that changes 
in mixing strength depend on the mixing 
strength in simulations of the current cli- 
mate, which was used as the initial value in the 
experiments. And third, they conclude that 
estimates of current mixing strength based 
on observations imply a climate sensitivity of 
more than 3 °C, which is in the upper half of 
the IPCC’s range of estimates. 

Another recent study* of constraints on 
the uncertainty of cloud responses, based on 
observational data, also suggested that higher 
climate sensitivities are more likely than lower 
ones. So can we declare the long-running 
debate about climate sensitivity to be over? 
Unfortunately not. Such sensitivity can also 
be inferred using observational data or using 
estimates of historical changes in surface- 
air temperature, heat intake by the ocean or 
Earth's radiative balance (the heating or cool- 
ing effects of anthropogenic greenhouse gases 
and aerosols). One such study, published last 
year, implies that climate sensitivities below 
2°C cannot be ruled out’, demonstrating that 
constraints on the uncertainty depend on the 
approaches used to determine them. 

There are many factors that could explain 
the discrepancy. Although the uncertainty 
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Figure 1 | More or less cloudy. The maps depict estimates of changes in the coverage of low-level cloud in response to an abrupt quadrupling of CO, levels, 
compared with today’s levels, as determined by five of the most recent generation of climate models. Data represent mean values for the period 11-20 years after 
CO, quadrupling. Sherwood et al.’ propose processes that lead to substantial differences in the response of low-level cloud to changing CO, levels, and which 
help to explain the variation in climate sensitivities calculated by the models. (Graphic generated by H.S. and T.O.) 


about changes in low-level cloud over the 
tropical ocean contributes greatly to the uncer- 
tainty of climate sensitivity, uncertainties in 
other processes — such as changes in sea ice, 
water vapour, atmospheric temperature and 
cloud at other atmospheric levels and regions 
of the world — are also important. 

Sherwood and colleagues’ study repre- 
sents a big advance, but questions persist. For 
example, around half of the spread of climate 
sensitivities estimated in their study remains 
unexplained. Furthermore, there is no guar- 
antee that the available ensemble of climate 
models samples the full range of uncertainty, 
or that the results might not be skewed by com- 
mon errors in most of the models”. 

But although the authors’ approach may 
not provide all the answers, the alternative 
approach of analysing past changes also has 
considerable difficulties. There are substantial 
uncertainties in estimates of radiative balance, 
and observational data on surface-air tempera- 
ture and ocean heat intake suffer from limited 
spatial and temporal coverage, sampling biases 
and discontinuities associated with the use of 
different measurement instruments. For exam- 
ple, a study’* last year suggests that the global 
warming rate in the past 15 years has been 
underestimated because of the lack of obser- 
vations of sea surface temperatures over the 
Arctic region. 

For now, Sherwood et al. have proposed and 
tested a convincing mechanism that explains 
half of the spread of models’ climate sensi- 
tivities, and which suggests that future climate 
will be warmer than expected. The fact that 
their findings are variously consistent and 


inconsistent with those of other studies poses 
further challenges for wide areas of research, 
including observations and reconstructions 
of climate systems, understanding of the pro- 
cesses involved, climate modelling, and analy- 
ses of climate simulations. All will be needed to 
solve the recondite climate-sensitivity puzzle. m 
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The beginning 


of the end 


Studies in mice and humans suggest that cellular senescence, the cessation of cell 
proliferation that is known to suppress cancer and promote ageing, may have 
evolved to regulate embryonic development. 


JUDITH CAMPISI 


ells that experience certain types of 
stress, particularly stress that is poten- 
tially cancer-causing, undergo an 
essentially permanent arrest of proliferation 
termed cellular senescence’. Since its for- 
mal description in the 1960s, cellular senes- 
cence has been thought both to suppress the 


development of cancer and to promote ageing. 
Support for these roles has come from tumour 
studies in mice and humans’, and from the 
realization that senescent cells secrete proteins 
that cause inflammation, a hallmark of ageing 
tissues’. More recently, a complex inflamma- 
tory response called the senescence-associated 
secretory phenotype (SASP) was shown to 
facilitate tissue repair and remodelling, and 
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to help the immune system rec- 
ognize and eventually remove 
senescent cells*. These multiple 
functions of cellular senescence 
are not mutually exclusive, but 
they raise an interesting teleologi- 
cal question: for what purpose 
did senescence evolve? Find- 
ings by Mufioz-Espin et al.° and 
Storer et al.°, published in Cell, 
suggest a surprising answer: to 
fine-tune embryogenesis. 

Both research groups found 
evidence for the presence of senes- 
cent cells in mouse and human 
embryos. To identify these cells, 
the researchers initially relied ona 
commonly used marker of senes- 
cence, the activity of an enzyme 
known as senescence-associated 
-galactosidase (SA-6-gal). Their 
combined results identified non- 
dividing SA-B-gal-containing 
cells in the embryonic kidney, the 
endolymphatic sac of the inner 
ear, developing limbs, the closing 
neural tube and the apical ecto- 
dermal ridge, among other struc- 
tures. Further analyses showed 
that non-dividing cells in these 
structures also expressed high 
levels of p21, a cell-cycle-inhibi- 
tor protein that is often expressed 
by senescent cells in culture and 
in postnatal tissues, and of 
a subset of SASP proteins, which are 
presumed to facilitate the infiltration of 
immune cells and eventual clearance of senes- 
cent cells (Fig. 1). 

Surprisingly, however, both groups found 
that non-dividing cells in these embryonic 
structures did not express p16", a cell-cycle- 
inhibitor and tumour-suppressor protein that 
is commonly produced by senescent cells in 
culture and in postnatal tissues; instead, they 
expressed p15, another cell-cycle inhibitor 
that is produced by only some non-embryonic 
senescent cells. Similarly, the cells showed no 
evidence of a DNA-damage response or acti- 
vation of p53, the tumour-suppressor and 
transcriptional-regulator protein that controls 
the senescence response to tissue damage or 
cancer-causing stress. The authors also show 
that senescence in the embryo depended on 
p21, whereas senescence in non-embryonic 
tissues depends primarily on p53 and p16. 
Moreover, p21 expression in the embryo was 
induced by two transcription factors, FOXO 
and SMAD, which are controlled by the PIK 
and TGF-6 signalling pathways; by contrast, 
induction of p21 during non-embryonic senes- 
cence is generally mediated by the DNA-dam- 
age response and p53. Thus, the senescence that 
occurs in embryos shares some, but not all, fea- 
tures of the senescence responses that suppress 
cancer and facilitate tissue repair (Fig. 1). 


p21, SA-B-gal 


SASP 


Figure 1 | Senescence modules. Cellular senescence is involved in tumour 
suppression, tissue repair and, as shown by Mufioz-Espin et al.’ and Storer et al.°, 
embryogenesis. In all three cases, senescent cells express the protein p21 and 
proteins associated with the senescence-associated secretory phenotype (SASP); 
they also exhibit senescence-associated B-galactosidase (SA-f-gal) activity. 
However, during tumour suppression and tissue repair, senescence depends 
primarily on the activity of the proteins p53 and p1 
induced as part of the DNA-damage response (DDR). By contrast, senescence 
during embryogenesis depends on p21, which is switched on by the transcription 
factors FOXO and SMAD. Senescent cells in the embryo also express p15. 
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What functions do senescent cells serve in 
the embryo? The authors of both papers specu- 
late that the cells might fine-tune the devel- 
opment of tissue structures in the embryo, 
as proposed 20 years ago’. In addition to 
curtailing their own proliferation, senescent 
cells secrete factors that have potent effects on 
other cells’, including effects on apoptotic cell 
death, cell migration, immune-cell infiltra- 
tion and angiogenesis (the generation of new 
blood vessels). It was surprising, therefore, 
that the researchers found only a few pre- or 
postnatal abnormalities in mouse embryos 
rendered senescence-free by deletion of the 
gene encoding p21. Of course, embryos are 
remarkably plastic and, indeed, the authors’ 
analyses of the kinetics and structure of mor- 
phogenesis in the senescence-free embryos 
showed that other tissue-remodelling pro- 
cesses largely compensate for the lack of 
senescence. 

The results reported by Mufioz-Espin et al. 
and Storer et al. are consistent with their view 
that cellular senescence evolved to optimize 
embryogenesis, and that its beneficial post- 
natal functions (tumour suppression and tissue 
repair) arose later during evolution. However, 
the distinct but overlapping manifestations of 
senescence in embryonic and postnatal tissues 
need not be a consequence of sequential evo- 
lution. Rather, cells might be programmed to 
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link arrested cell proliferation to 
other cellular responses, including 
a secretory phenotype, to meet a 
variety of physiological needs and 
respond to various forms of stress. 
This possibility would explain 
why some senescent states seem 
to depend primarily on p53, oth- 
ers on p16", yet others on p21, 
and so on. It might also explain 
why there are no markers that are 
unique to senescent cells*. Finally, 
the idea that senescence responses 
are assemblies of cellular character- 
istics might explain why the SASP 
proteins differ depending on the 
senescence inducer, cell type and 
tissue of origin’, including whether 
the senescent cells reside in post- 
natal tissues or the embryo”*. 
Regardless of the origin of cel- 
lular senescence, the deleterious 
(pro-ageing) effects of senescent 
cells are clearly maladaptive. 
The number of senescent cells 
increases with age in many tissues, 
possibly because they are incom- 
pletely eliminated by the immune 
system and/or produced in greater 
numbers in aged organisms. Con- 
sequently, aged tissues might 
suffer from an accumulation of 
non-dividing cells and the persis- 
tent presence of SASP factors that 
can promote chronic inflamma- 
tion and alter tissue structure and function. 
The findings that vertebrate embryos are 
replete with cells bearing characteristics of 
senescent cells opens up possibilities for fur- 
thering our understanding of the relationship 
between embryonic and adult cells, and how 
tissue regeneration, tumour suppression and 
ageing are balanced. They also raise ideas for 
potential therapies. Is it possible, for example, 
to activate embryonic senescence programs 
to optimize tissue repair postnatally or even 
in aged adults? Answers to such questions 
depend, of course, on future research. = 
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Spread in model climate sensitivity 
traced to atmospheric convective mixing 


Steven C. Sherwood, Sandrine Bony” & Jean-Louis Dufresne? 


Equilibrium climate sensitivity refers to the ultimate change in global mean temperature in response to a change in 
external forcing. Despite decades of research attempting to narrow uncertainties, equilibrium climate sensitivity 
estimates from climate models still span roughly 1.5 to 5 degrees Celsius for a doubling of atmospheric carbon dioxide 
concentration, precluding accurate projections of future climate. The spread arises largely from differences in the 
feedback from low clouds, for reasons not yet understood. Here we show that differences in the simulated strength of 
convective mixing between the lower and middle tropical troposphere explain about half of the variance in climate 
sensitivity estimated by 43 climate models. The apparent mechanism is that such mixing dehydrates the low-cloud layer 
at a rate that increases as the climate warms, and this rate of increase depends on the initial mixing strength, linking the 
mixing to cloud feedback. The mixing inferred from observations appears to be sufficiently strong to imply a climate 
sensitivity of more than 3 degrees for a doubling of carbon dioxide. This is significantly higher than the currently 
accepted lower bound of 1.5 degrees, thereby constraining model projections towards relatively severe future warming. 


Ever since numerical global climate models (GCMs) were first developed 
in the early 1970s, they have exhibited a wide range of equilibrium 
climate sensitivities (roughly 1.5-4.5 °C warming per equivalent doub- 
ling of CO, concentration)’ and consequently a broad range of future 
warming projections, with the uncertainty due mostly to the range of 
simulated net cloud feedback*”. This feedback strength varies from roughly 
zero in the lowest-sensitivity models to about 1.2-14Wm °K ' 
in the highest*. High clouds (above about 400 hPa or 8 km) contribute 
about 0.3-0.4 Wm °K! to this predicted feedback because the tem- 
peratures at the tops of the clouds do not increase much in warmer 
climates, which enhances their greenhouse effect. Mid-level cloud 
changes also make a modest positive-feedback contribution in most 
models’. 

Another positive feedback in most models comes from low cloud, 
occurring below about 750hPa or 3km, mostly over oceans in the 
planetary boundary layer below about 2 km. Low cloud is capable of 
particularly strong climate feedback because of its broad coverage and 
because its reflection of incoming sunlight is not offset by a commen- 
surate contribution to the greenhouse effect’. The change in low cloud 
varies greatly depending on the model, causing most of the overall 
spread in cloud feedbacks and climate sensitivities among GCMs°”. 
No compelling theory of low cloud amount has yet emerged. 

A number of competing mechanisms have, however, been suggested 
that might account for changes in either direction. On the one hand, 
evaporation from the oceans increases at about 2% K_}, which—all 
other things being equal—may increase cloud amount*. On the other 
hand, detailed simulations of non-precipitating cloudy marine bound- 
ary layers show that if the layer deepens in a warmer climate, more dry 
air can be drawn down towards the surface, desiccating the layer and 
reducing cloud amount*’. 


The lower-tropospheric mixing mechanism 

We consider that a mechanism similar to this one, which has so far 
been considered only for a particular cloud regime, could apply more 
generally to shallow upward moisture transports, such as by cumulus 


congestus clouds or larger-scale shallow overturning found broadly 
over global ocean regions. Air lifted out of the boundary layer can 
continue ascending, rain out most of its water vapour, and then return 
to a relatively low altitude—or it can exit the updraught directly at the 
low altitude, retaining much more of its initial vapour content. The 
latter process reduces the “bulk precipitation efficiency” of convection”, 
allowing greater transport of moisture out of the boundary layer for a 
given precipitation rate. Such a process can increase the relative humidity 
above the boundary layer’’ and dry the boundary layer. Unlike the global 
hydrological cycle and the deep precipitation-forming circulations”, 
however, it is not strongly constrained by atmospheric energetics". 

We present measures of this lower-tropospheric mixing and the 
amount of moisture it transports, and show that mixing varies sub- 
stantially among GCMs and that its moisture transport increases in 
warmer climates at a rate that appears to scale roughly with the initial 
lower-tropospheric mixing. 


Mixing-induced low cloud feedback 

The resulting increase in the low-level drying caused by lower-tropospheric 
mixing produces a mixing-induced low cloud (MILC) feedback of vari- 
able strength, which can explain why low-cloud feedback is typically 
positive’ and why it is so inconsistent among models. 

In a GCM, vertical mixing in the lower troposphere occurs in two 
ways (Extended Data Fig. 1). First, small-scale mixing of heat and water 
vapour within a single grid-column of the model is implied by con- 
vective and other parametrizations. Lower-tropospheric mixing and 
associated moisture transport would depend on transport by shallow 
cumulus clouds, but also on the downdrafts, local compensating sub- 
sidence and evaporation of falling rain that are assumed to accompany 
deeper cumulus. Second, large-scale mixing across isentropes occurs 
via explicitly resolved circulations. Whether this contributes to lower- 
tropospheric mixing will again depend on model parametrizations, 
but in this case, on their ability to sustain the relatively shallow heating 
that must accompany a shallow (lower-tropospheric) circulation. We 
measure these two mixing phenomena independently, starting with 
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the small-scale part, and show that both phenomena progressively dry 
the boundary layer as climate warms. 


The small-scale component of mixing 

Lower-tropospheric mixing parametrized within a GCM grid cell 
cannot be directly diagnosed from model output (although it contri- 
butes to the convective terms in the water vapour budget; see below). 
Weassert, however, that an atmosphere’s propensity to generate such 
mixing can be gauged by observing the thermal structure just above 
the boundary layer in ascending, raining regions. As discussed above, 
air there is either transported directly from the boundary layer with 
minimal precipitation via lower-tropospheric mixing, or indirectly by 
ascending in deeper, raining clouds and then descending. The air would 
arrive cool and humid in the former case, but warmer and drier in the 
latter case owing to the extra condensation, allowing us to evaluate 
which pathway dominates by observing mean-state air properties. 

To do this we use an index S, proportional to the differences AT799_g50 
and AR7o9_gs0 of temperature and relative humidity between 700 hPa 
and 850 hPa (S taken as a linear combination; see Methods Summary) 
averaged within a broad ascending region which roughly coincides 
with the region of highest Indo-Pacific ocean temperatures (the Indo- 
Pacific Warm Pool; Fig. 1). Of the full set of 48 models used in this 
study, those with a less negative AT79_gs0 in this region consistently 
show a more negative AR7o9_g50 there (Fig. 2a), and the variations in 
each quantity are quite large. We interpret this as strong evidence that 
both quantities are dominated by variations, evidently large, in the 
amount of lower-tropospheric mixing in the ascent region, with higher 
S indicating stronger mixing. 

Small-scale lower-tropospheric mixing of moisture is part of the 
overall source of the water vapour that is associated with the para- 
metrized convection, Msmau. This quantity is available from nine of 
the models (see Methods Summary). It always exhibits strong drying 


a Low sensitivity 
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Figure 1 | Multimodel-mean local stratification parameter s. The index S is 
the mean of s within the regions outlined in white. Multimodel averages of s are 
shown separately for low-sensitivity (ECS < 3.0 °C) (a) and high-sensitivity 
(ECS > 3.5 °C) (b) models, among coupled models with known ECS. The white 
dots inside the S-averaging region show the locations of radiosonde stations 
used to help estimate S observationally. A few coastal regions that are off-scale 
appear white. 
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Figure 2 | Basis for the index S of small-scale lower-tropospheric mixing 
and its relationship to the warming response. a, AT799_3s50 versus AR7o0_g50; 
each averaged over a tropical region of mean ascent (see Fig. 1), from all 48 
coupled models. For reference, a saturated-adiabatic value of AT is shown by 
dotted line at —7.2 K, and a dry-adiabatic value (not shown) would be about 
—16K. Error bars are 2¢ ranges. b, Change in small-scale moisture source 
Mgmau below 850 hPa in the tropics upon +4 K ocean warming, versus S 
computed from the control run, in eight atmosphere models and one CMIP3 
model. Symbol colour indicates modelling centre or centre where atmosphere 
model was originally developed and symbol shape indicates model generation. 


near the surface. Above about 850 hPa, it can either dry the atmo- 
sphere on average or moisten it depending on the model (Extended Data 
Fig. 2), reflecting the competition between drying from condensation 
and moistening from lower-tropospheric mixing and from evaporat- 
ing precipitation falling from higher altitudes. 

Although M;mau does not reflect lower-tropospheric mixing alone, 
we can test whether lower-tropospheric mixing (as diagnosed from S) 
affects how Mgnan responds as climate warms. The available data 
confirm that, given a +4K warming, convective drying of the plan- 
etary boundary layer increases by 4-17 Wm * (6-30%), compared to 
a typical increase of 8% in global or tropical surface evaporation. The 
drying increase is highly correlated (r = —0.79) with S (Fig. 2b). Thus, 
convective dehydration of the planetary boundary layer outstrips the 
increase in surface evaporation with warming, in all models except 
those with the lowest S. Higher-sensitivity models also have higher S 
(Fig. 1), suggesting that this process drives a positive feedback on climate. 


The large-scale component of mixing 


We next turn to the large-scale lower-tropospheric mixing, which we 
associate with shallow ascent or flows of air upward through the top of 
the boundary layer that diverge horizontally before reaching the 
upper troposphere. Although air ascending on large scales over warm 
tropical oceans typically passes through nearly the whole troposphere, 
over cooler oceans its ascent often wanes with altitude, showing that 
this type of mixing indeed occurs in the Earth’s atmosphere (Fig. 3). 
The associated mid-level outflows are well documented for the central 


©2014 Macmillan Publishers Limited. All rights reserved 


and eastern Pacific and Atlantic Intertropical Convergence Zone and 
some monsoon circulations'*”*. Although these are indeed the regions 
where shallow ascent is steadiest, and hence clearest in monthly-mean 
data (Fig. 3), in daily reanalysis data, shallow ascent is equally strong 
outside the tropics owing largely to contributions from extratropical 
storms. We also note that although we focus here on regions of 
ascending air, that is because the ascending branches are where the 
circulations are easiest to measure; they must, however, descend else- 
where, exerting a net transport of water vapour that is upward and 
towards the convective regions. 

Figure 3 compares the observations with two example models. Neither 
model shows as much shallow ascent (red colour) as the observation- 
based estimates, but the Institut Pierre Simon Laplace (IPSL)-CM5A 
model comes closer. Although convective treatment in the newer IPSL- 
CM5B model is more detailed and produces better results in important 
respects’*, here it is seen to produce strong deep ascent of air (white 
spots) where it is weaker and shallower in observations (red zones), 
showing that improvement in some aspects of a simulation does not 
automatically improve others. 

We quantify the large-scale lower-tropospheric mixing more thor- 
oughly by calculating the ratio D of shallow to deep overturning (see 
Methods Summary) in a broad region encompassing most of the 
persistent shallow ascent (see Fig. 3). This index D varies by a factor 
of four across 43 GCMs (see below). Interestingly, however, D and S 
are uncorrelated (r = 0.01), confirming that the two scales of mixing 
are controlled by different aspects of model design. 

The effective source of moisture Myr, arge due to this shallow over- 
turning (that is, large-scale, lower-tropospheric convection) and its 
change upon climate warming, can be directly calculated from model 
wind and humidity fields. We approximate Myr, targe using monthly- 
mean data from the ten available atmospheric models (see Methods 
Summary). Myr,targe isolates only shallow mixing, whereas Mgman 


a MERRA 


Figure 3 | The structure of monthly-mean tropospheric ascent reveals large- 
scale lower-tropospheric mixing in observations and models. Upward 
pressure velocity « in one month (September) from the MERRA reanalysis 
(a), the IPSL-CM5A model (b) and the IPSL-CM5B model (c) with values at 
850 hPa shown in red and those at 500 hPa shown in green plus blue. Bright 
red implies ascent that is weighted toward the lower troposphere with 
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includes the effects of all parameterized convection; yet despite this, 
the profiles My 7 jarge (Fig. 4) resemble those of Msman, with strong 
drying in the boundary layer and weak moistening above. Not unex- 
pectedly, these effects are greater in the high-D models than in the 
low-D ones. 

Crucially, the low-level drying also increases faster upon +4K 
warming in the high-D models (by about 30%, or 15Wm 7K 
when expressed as a latent heat flux) than in the low-D models (25%, 
or 0.9Wm *K7'). Thus, the response of Mir, large grows with D as 
Mgmau grew with S; the relationship for D is not as strong (r = 0.46 for 
land + ocean, r = 0.25 for ocean only), partly because the spread of D 
happens to be somewhat narrow among the available atmosphere 
models, but is still significant at 95% confidence. 


Climate sensitivity 


We now apply the indices S and D to the 43 GCMs for which an 
equilibrium climate sensitivity (ECS) is available. Each index inde- 
pendently explains about 25% of the variance in ECS (Fig. 5a, b). 

Because the ranges of D and S are similar (each 0.3-0.4), as are 
(approximately) those of their drying responses upon warming (see 
below), we form an overall lower-tropospheric mixing index (the 
LTMI) by simply adding the two: LTMI = S + D. This LTMI explains 
about 50% of the variance in total system feedback (r = 0.70) and ECS 
(r = 0.68) (Fig. 5c). Thus, although our measure of lower-tropospheric 
mixing does not explain all of the variations among GCMs, it does 
explain a significant portion of the model spread. 

This explanatory power derives primarily from low cloud feed- 
backs. The correlation between LTMI and the +4 K change in short- 
wave cloud radiative effect in the atmosphere models, which spans a 
range of 1.8 Wm *K’ in the tropics, is 0.65 in the tropics and 0.57 
in subsidence regions (equivalent values estimated from a subset of 
the coupled models providing the needed output are 0.25 and 0.47 


(hPa day-') 


850 


mid-tropospheric divergence (see colour scale), white implies deep ascent, and 
dark colours imply descent. In a, black lines outline the region in which the 
index D of large-scale lower-tropospheric mixing is computed. The Pacific and 
Atlantic Intertropical Convergence Zone regions are consistently red in the 
reanalyses and models, whereas isolated red patches in other areas tend to vary 
with time. 
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Figure 4 | Estimated water vapour source Mr, large due to large-scale lower- 
tropospheric mixing and its response to warming. See Methods for 
calculation details. Data are from ten atmosphere models, averaged from 30° S 
to 30° N over oceans, with the average of the four models having the largest D 
shown in magenta and the average of the four models with the smallest D 
shown in blue. Dashes show results in +4 K climate. Changes at +4 K are 
nearly identical whether or not land areas are included. 


respectively). These correlations suggest that the predictive skill of 
LTMI arises from both subsidence and other regions; further work 
is needed to better assess this. Cloud amount reduces more in high- 
LTMI models both at low and mid-levels (Extended Data Fig. 3), 
although the greater net radiative impact of low cloud makes its effect 
dominant’®. Previously reported water vapour and lapse-rate feed- 
backs” are, in contrast, not correlated with the LTMI. 

Is the imputed lower-tropospheric mixing impact on low clouds 
strong enough to explain the approximately 1.5 Wm” K~' spread of 
cloud feedbacks seen in GCMs?* One recent study’* imposed increased 
surface latent heat fluxes in a large region typified by shallow clouds, 
finding an increase in cloud-related net cooling of about 1 W m ” fora 
2-3 W m ” increase in the surface flux, other things held fixed. An 
even larger sensitivity, nearly 1:1, has been reported in a different 
model for advective changes in moisture input”. Ifa similar but opposite 
cloud response occurred for moisture removal by lower-tropospheric 
mixing, then to explain the feedback spread, the boundary-layer drying 
responses would need to span a range across models of about 3 W m7 
per K of surface warming. This roughly matches the contribution to the 
spread from Mgnan alone (Fig. 2b). The additional drying response 
from Mz, targe Was about 0.6 W m?*K? greater in the high-D models 
(mean D of 0.34) than in the low-D ones (mean 0.24), which, if rescaled 
by the full spread of D in the full GCM ensemble, implies a further 
source of spread in drying response of about 2Wm *K |. We con- 
clude that, even if not all low clouds are as sensitive as the ones exam- 
ined in the cited studies, the lower-tropospheric mixing response is 
strong enough to account for the cloud feedback spread and its typ- 
ically positive sign’. 

Why does moisture transport increase so strongly with warming? 
The magnitude of these increases, typically 5%-7% per K of surface 
warming, is roughly what would be expected if the circulations remained 
similar against a Clausius—Clapeyron increase in moisture gradients”, 
as indeed it does, at least for the large-scale part”! (Extended Data Fig. 4). 
Further study is needed to understand why this is so, and to examine in 
greater detail how clouds respond to changing moisture transports; 
changes in low cloud amount may for example help the atmosphere 
restore imbalances in boundary layer moist enthalpy such as those caused 
by lower-tropospheric mixing'’. Because LTMI ignores any information 
on clouds, it is likely that additional measures of cloud characteristics” 
could explain some of the variations in low-cloud feedback not yet 
explained here. 
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Figure 5 | Relation of lower-tropospheric mixing indices to ECS. ECS versus 
S (a), D (b) and LTMI= S + D (c) from the 43 coupled models with known 
ECS. Linear correlation coefficients r are given in each panel (r = 0.70 in cis the 
correlation to the total system feedback). Error bars shown near panel axes 
indicate 2o ranges of the direct radiosonde estimate (a) and the S value from 
radiosondes added to the D value from each of the two reanalyses (c). ERAi and 
MERRA are the two monthly reanalysis products. 


We end by considering observational estimates of S and D (see 
Fig. 5). These show an S near the middle of the GCM range, but a 
D close to the top end, as hinted already by Fig. 3. D may not be well 
constrained because @ must be inferred from observational reana- 
lyses, although available horizontal wind observations support the 
existence of strong mid-level outflows!3, and the result is consistent 
across both reanalyses examined. The reanalysis estimates of S are less 
consistent but this quantity can be fairly well constrained by radio- 
sonde observations. 

Taking the available observations at face value implies a most likely 
climate sensitivity of about 4°C, with a lower limit of about 3 °C. 
Indeed, all 15 of the GCMs with ECS below 3.0°C have an LTMI 
below the bottom of the observational range. Further work may be 
needed to better constrain these indices, and to test whether their 
relationship to ECS is robust to design factors common to all models. 
For example, this should be tested in global cloud-resolving models. 
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The possibility can never be ruled out that feedbacks could exist in 
nature that are missing from all models, which would change the 
climate sensitivity from that suggested by our result. Nonetheless, 
on the basis of the available data, the new understanding presented 
here pushes the likely long-term global warming towards the upper 
end of model ranges. 


Discussion 


Although a few previous studies have already noted that higher-sensitivity 
models simulate certain cloud-relevant phenomena better”*, ours is 
the first to demonstrate a causal physical mechanism, or to show 
consistent predictive skill across so many models, or to point to pro- 
cesses connecting low-cloud regions to the deep tropics. The MILC 
mechanism is surprisingly straightforward. Lower-tropospheric mixing 
dries the boundary layer, and the drying rate increases by 5-7% K~' in 
warmer climates owing to stronger vertical water vapour gradients. 
The moisture source from surface evaporation increases at only about 
2%K~'. Thus as climate warms, any drying by lower-tropospheric 
mixing becomes larger relative to the rest of the hydrological cycle, 
tending to dry the boundary layer. How important this is depends on 
how important the diagnosed lower-tropospheric mixing was in the 
base state of the atmosphere. Lower-tropospheric mixing is unrealist- 
ically weak in models that have low climate sensitivity. 

Climate-sensitivity-related differences in lower-tropospheric mix- 
ing, both at small (Fig. 1) and large scales (Fig. 3), are most detectable 
in regions of tropical deep or mixed-level convection and mean upward 
motion. This does not mean, however, that the greater low-level drying 
in a warmer climate or the spread of drying among models will be 
limited to these regions. 

Large-scale lower-tropospheric mixing carries water vapour not only 
upward but also horizontally away from subsidence regions; because both 
directions of transport intensify in a warmer atmosphere”’, subsidence 
regions should bear the brunt of the overall boundary-layer drying. 
Moreover, shallow ascent is equally strong (though more transient) in 
mid-latitude storm tracks and in the tropics, suggesting that MILC 
feedback may be just as important outside the tropics as within them. 

As for small-scale lower-tropospheric mixing, even though there are 
reasons to measure it in ascending regions (see Methods), its impact 
upon warming is much more widespread and differs significantly among 
models in subsiding regions (Extended Data Fig. 5). We hypothesise 
that this is because models with more small-scale lower-tropospheric 
mixing in ascending regions also have more in descending regions, 
although we cannot confirm this directly. Overall, the behaviour is con- 
sistent with published results showing that subsiding regions contribute 
strongly to the spread of cloud feedbacks in models, with storm tracks 
and tropical convective regions also playing a part'®°””. 

Lower-tropospheric mixing behaviour appears to result from a 
competition between shallow and deep convection in situations where 
either could occur. Such situations persist in many tropical regions, 
notably the Intertropical Convergence Zone. Understanding and 
properly representing this competition in climate models is undoubt- 
edly necessary for more accurate future climate projections. 

Although tested here on models used over the past decade or so, we 
presume that this mechanism has been a leading source of spread in 
sensitivity since the dawn of climate modelling. Finally to identify an 
atmospheric process that drives variations in climate sensitivity offers an 
unprecedented opportunity to focus research and model development 
in ways that should lead to more reliable climate change assessments. 


METHODS SUMMARY 


Data for computing S and D come from control runs of 48 models: 18 from the 
Coupled Model Intercomparison Project version 3 (CMIP3)”* and 30 from 
CMIPS5 (ref. 29) (see Extended Data Tables 1 and 2). ECS was reported for all 
but one CMIP3 model by the Intergovernmental Panel on Climate Change”. For 
CMIP5 we employ effective climate sensitivities calculated from abrupt 4 X CO 
experiments, available for 26 models, following a standard regression procedure*™*’. 
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Data for Msman and Mr, targe come from ten CMIP5 atmosphere models providing 
‘amip’ (specified ocean surface temperature) control and +4 K ocean warming 
runs. Eight of these models provided Mgmau; We also included data from the Parallel 
Climate Model (CMIP3). 

Observational estimates come from radiosondes and two monthly reanalysis 
products (ERAi and MERRA). Reanalyses are produced from a model con- 
strained to the fullest extent possible by a variety of observations”. 

We calculate S within a region where convective effects are a leading term in 
thermodynamic budgets, defined by the upper quartile of the annual-mean mid- 
tropospheric ascent rate where it is upward, — 599 (@ the pressure velocity). We 
define S = (AR7o09-850/100% — AT790-350/9 K)/2, which normalizes AR7o9_gs0 to 
100% humidity, AT7o9-s50 to the approximately 9-K range between dry and 
saturated adiabatic values, and averages these two pieces of information with 
equal weight to reduce noise from other factors. 

To calculate Myr, arge We Compute @, (the average of « at 850 hPa and 700 hPa) 
and @y (the average of w at 600 hPa, 500 hPa and 400 hPa). 4 = w, — w, measures 
the local horizontal outflow in the lower troposphere above the boundary layer. 
Moisture is transported upward and outward wherever 4 >0 and w, <0. We 
restrict measurement to tropical ocean regions from 160° W to 30° E (see Fig. 3). 
The moisture supplied to the environment is estimated as Myr, targe = —(qdoo/ 
dpH(A)H(—«,)), where p is the pressure, q is the specific humidity, ¢...) indicates 
the mean over the restricted region, and H is the step function. Finally, 
D=(4H(4)H(—@)))(—@2H(— a2). 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 


Data for computing S and D come from 48 models: 18 from the CMIP3 (Coupled 
Model Intercomparison Project version 3)”, the first two years of each “picntrl” 
run, and 30 models from the CMIP5 (ref. 29), the first two years of each “IpctCO2” 
run. Two years of data is sufficient to specify S and D to within 0.02 or better of their 
long-term values. CMIP3 data were obtained from the Australian National 
Computational Infrastructure node, and CMIP5 data including the ‘amip’ and 
‘amip+4K runs were obtained on 14 September 2012 and 22 October 2012 from 
the IPSL Ciclad repository. ECS values for CMIP3 were reported for all but one 
model by the Intergovernmental Panel on Climate Change”*. For CMIP5 we employ 
effective climate sensitivities calculated from abrupt 4 X CO, experiments, avail- 
able for 26 of the 30 CMIP5 models, following a standard regression procedure”. 

Data for Mgmay and Myr, targe Come from ten CMIP5 atmosphere models pro- 
viding ‘amip’ (specified ocean surface temperature) control and +4K ocean 
warming experiments. A key advantage of this experiment setup is that inter- 
annual ocean variability is the same in the control and warming runs, and changes 
in the sea surface temperature pattern—which could complicate interpretation, 
especially for circulation changes—are avoided. Data are from 1989-98, except 
for IPSL-CM5A, in which some of these years were corrupted and alternative 
years were used. Results from individual years were similar to those for the ten- 
year averages. Eight of these models provided Mgmau3 we also included data from 
the PCM CMIP3 1%-per-year-to-quadrupling experiment, with changes rescaled 
to the +4 K equivalent (actual change 3.3 K). PCM Mgman data come from ten years 
near the beginning and ten years near the end of the 1%-per-year-to-quadrupling 
experiment, obtained from the National Center for Atmospheric Research node of 
the Earth System Grid. 

The shortwave cloud radiative effect is obtained by differencing the all-sky and 
clear-sky top-of-atmosphere shortwave fluxes for each model run. To calculate 
cloud feedback we first composite the sensitivity of the shortwave cloud radiative 
effect to sea surface temperature in dynamical regimes defined by vertical-mean 
vertical velocity, and then we compute the sum (weighted by the probability 
distribution function of «) over regimes (or only subsidence regimes defined 
by w > 0)’. For coupled models, the warming-induced change is obtained from 
abrupt CO3-quadrupling experiments, after removing the instantaneous change 
associated with rapid adjustment to higher CO, estimated from the first 12 months 
after quadrupling. Only one realization is used per model. For atmosphere-only 
models it is simply the difference between the +4 K and the control simulations. 

Observational estimates come from radiosondes and from two monthly reana- 
lysis products (ERAi and MERRA), years 2009-10. The reanalyses are produced 
from a model constrained to the full extent possible by a variety of observations**”’. 
MERRA reanalysis data from 1 September 2009 were used to compare D inside and 
outside the tropics, but monthly data were used otherwise. Radiosonde data were 
obtained from the Integrated Global Radiosonde Archive and subjected to simple 
quality-control checks for outliers. The ten stations sited in the relevant region and 
meeting the criteria described by a previous study** were used, and the mean taken 
over the 2 years. The radiosonde network sampling bias, as determined from 
station-sampled reanalysis output, was relatively small compared to the overall 
reanalysis biases. 
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We calculate S in ascending regions, where convective effects are a leading term 
in thermodynamic budgets; in subsidence regions humidity is sensitive to irrel- 
evant non-local factors and even to numerical resolution’’, perhaps explaining 
why it is less informative for our purposes. The calculation region is defined by the 
upper quartile of the annual-mean mid-tropospheric ascent rate in ascending 
regions, —@so9 (where ~ is the pressure velocity). We define S= (ARzo0_s50/ 
100% — AT7o0-850/9 K)/2, which normalizes ARzo9-ss0 to 100% humidity and 
AT70-50 to the approximately 9 K range between the dry and saturated adiabatic 
values, and then averages these two pieces of information with equal weight. Such 
averaging should reduce the noise from other factors that influence one quantity 
or the other. Varying the weighting of the two terms does not strongly affect results. 

To calculate Myr, targe, We first compute @, (the average at 850hPa and 
700 hPa) and @ (the average w at 600 hPa, 500 hPa and 400 hPa). The difference 
A = @2 — @, then measures the local horizontal outflow in the lower troposphere 
above the boundary layer. Moisture is transported upward and outward at this 
level wherever 4 >0 and w, <0. We restrict measurement to tropical ocean 
regions from 160° W to 30° E (see Fig. 3). The moisture supplied to the envir- 
onment is then estimated as Myr, jarge = —(qdw/dpH(4)H(—«)), where q is the 
specific humidity, (...) indicates the mean over the restricted calculation region, 
and H is the step function. The index D is computed as D = (4H(4)H(—@))/ 
(—@2H(—«ay)). 

Values of D and S are similar over ten years of data or one year, and are similar 
whether individual months or long-term means for each month of the year are 
used. These indices capture over 25% of the ECS variance even if computed from 
only a single month of data from each model. Thus, long records are unnecessary 
for deducing the strength of lower-tropospheric mixing. 

The reason for restricting calculation of D to the cooler tropical longitudes is 
that a few climate models erroneously place much of the shallow ascent over 
warm oceans, where it does not seem to contribute as much to low-cloud feed- 
back. In observations, and in most models, the restriction has little effect because 
most of the shallow ascent persistent enough to appear in monthly-mean data is 
already located in the specified region. We speculate that the location of the ascent 
matters because the associated shallow descent is more relevant if it occurs over, 
or upstream of, regions of radiatively important low cloud. 

Both lower-tropospheric mixing indices retain statistically significant correla- 
tions with ECS for all alterations to their definitions that we tried. Specifically, the 
correlation of S with ECS (rs_gcs) is similar with @so9 percentiles of 0.25 or 0.5, 
but drops with looser thresholds, which begin to pick up parts of the resolved 
lower-tropospheric mixing region. Tighter thresholds reduce the spread in S 
between models, reducing rs_gcs. The correlation rp_gcs is somewhat weaker 
(as low as 0.3) if the longitudinal restriction for D is removed, or if other defini- 
tions of w, and w, are used. 


34. Sherwood, S. C., Meyer, M. L, Allen, R. J. & Titchner, H. A. Robust tropospheric 
warming revealed by iteratively homogenized radiosonde data. J. Clim. 21, 
5336-5352 (2008). 

35. Sherwood, S. C., Roca, R., Weckwerth, T. M. & Andronova, N. G. 

Tropospheric water vapor, convection and climate. Rev. Geophys. 48, RG2001 
(2010). 
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Extended Data Figure 1 | Illustration of atmospheric overturning relative role of lower-tropospheric mixing in exporting humidity from the 


circulations. Deep overturning strongly coupled to the hydrological cycleand —_ boundary layer as the climate warms, thus depleting the layer of water vapour 
atmospheric energy budget is shown by solid lines; lower-tropospheric mixing needed to sustain low cloud cover. 
is shown by dashed lines. The MILC feedback results from the increasing 
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Extended Data Figure 2 | Small-scale moisture source M,nan- Vertical 
profile averaged over all tropical oceans, for two selected climate models (see 


legend) with very different warming responses, in present-day (solid) and +4 K 
(dashed) climates. 
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Extended Data Figure 3 | Response of cloud fraction to warming. Profile of 
average change in model cloud fractional cover at +4 K in the four atmosphere 
models with largest (magenta) and smallest (blue) estimated +4 K increases in 
planetary-boundary-layer drying, averaged from 30° S to 30° N (dashed) or 
60° S to 60° N (solid). The drying estimate is obtained by adding the explicitly 
computed change in Myr, large to the change in M,m estimated from S via the 
relationship shown in Fig. 2a. The typical mean cloud fraction below 850 hPa is 
about 10% to 20%, and the changes shown are absolute changes in this fraction, 
so are of the order of 10% of the initial cloud cover. 
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Extended Data Figure 4 | Response of large-scale lower-tropospheric 
mixing to warming. Profiles of mean vertical velocity in regions of shallow 
ascent, in control and +4 K climates. The similarity of dashed and solid lines 
indicates that mass overturning associated with these regions is roughly the 
same in the warmer simulations, on average. 


©2014 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


ARTICLE 


40.0 
kame a 
Se aft 32.0 
7 24.0 
Fob on Bee . 16.0 
i SS 8.0 . 
= 
0.0 
= 
-8.0 
-16.0 
MPI-ESM-LR h MRI-CGCM3 i "ana 
= : a -32.0 
-40.0 
Extended Data Figure 5 | Response of small-scale, low-level drying to surface. Zero contours are shown in white (a few off-scale regions also appear 


warming. Change in convective moisture source Mgman below 850hPa upona _ white). The models used for calculating Miarge are the eight shown here plus two 
+4K warming in eight atmosphere models and one CMIP3 coupled model; for which M.mau data were unavailable: CNRM-CM5 and FGOALS-g2. 
units are Wm ”, with negative values indicating stronger drying near the 
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Extended Data Table 1 | List of CMIP5 coupled models used 


Model Centre Forcing (Wm-?) Total feedback (Wm-2K-!) ECS (kK) 
ACCESS1-0 ACCESS 3.01 -0.79 3.79 
ACCESS1-3 ACCESS 2.96 -0.86 3.45 
BCC-CSM1-1 BCC 3.35 -1.16 2.88 
BNU-ESM GCESS/BNU 3.78 -0.92 4.11 
CanESM2 ccc 3.85 -1.05 3.68 
CCSM4 NCAR 3.70 -1.27 2.92 


CESM1-BGC NCAR — _ _— 


CESM1-CAM5 NCAR — = _ 


CMCC-CM CMCC — _ _ 
CNRM-CM5 CNRM 3.71 -1.14 3.25 
CSIRO-Mk3-6-0 CSIRO/QCCCE 2.63 -0.66 3.99 
FGOALS-g2 LASG/IAP 2.89 -0.84 3.45 
FGOALS-s2 LASG/IAP 3.84 -0.92 4.16 
GFDL-CM3 GFDL 3.00 -0.76 3.96 
GFDL-ESM2G GFDL 3.11 -1.31 2.38 
GFDL-ESM2M GFDL 3.41 -1.41 2.41 
GISS-E2-H GISs 3.83 -1.66 2.30 
GISS-E2-R GIss 3.77 -1.79 2.11 
HadGEM2-ES MOHC 2.95 -0.65 4.55 
INMCM4 INM 2.98 -1.44 2.07 
IPSL-CM5A-LR_ ~—sIPSL 3.12 -0.76 4.10 
IPSL-CM5B-LR ~—IPSL 2.66 -1.03 2.59 
MIROC5 MIROC 4.16 -1.54 2.71 
MIROC-ESM MIROC 4.27 -0.92 4.65 
MPI-ESM-LR MPI 4.15 -1.15 3.60 
MPI-ESM-MR MPI 4.11 -1.20 3.44 
MPI-ESM-P MPI 4.35 -1.27 3.42 
MRI-CGCM3 MRI 3.26 -1.26 2.59 


NorESM1-ME NCC — _— _— 
NorESM1-M NCC 3.21 -1.13 2.83 


Centre acronyms used to identify them in scatter plots are also shown. The derived forcing, total feedback, and equilibrium climate sensitivities are given for models with abrupt 4 x COz simulations. 
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Extended Data Table 2 | List of CMIP3 coupled models used 


Model Centre ECS (K) 
CCCMA-CGCM3 1 CCC 3.4 
CCCMA-CGCM3 1T63 CCC 3.4 
GFDL-CM2-0 GFDL 2.9 
GFDL-CM2-1 GFDL 3.4 
GISS-MODEL-E-H GISS 27 
GISS-MODEL-E-R GISs 2.7 
IAP-FGOALS1-0-G IAP 2.3 
INGV-ECHAM4 INGV — 
INMCM3-0 INM 2.1 
IPSL-CM4 IPSL 4.4 
MIROC3-2-HIRES MIROG 4.3 
MIROC3-2-MEDRES MIROC 4.0 
MPI-ECHAM5 MPI 3.4 
MRI-CGCM2-3-2A MRI 3.2 
NCAR-CCSM3-0 NCAR 27 
NCAR-PCM1 NCAR 2.1 
UKMO-HadCM3 MOHC 3.3 
UKMO-HadGEM1 MOHC 4.4 


Centre acronyms used to identify them in scatter plots are also shown, as are feedback values given by 
ref. 28. 
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The complete genome sequence of a 
Neanderthal from the Altai Mountains 


Kay Priifer', Fernando Racimo”, Nick Patterson’, Flora J ay’, Sriram Sankararaman*“, Susanna Sawyer’, Anja Heinze’, 

Gabriel Renaud!, Peter H. Sudmant°, Cesare de Filippo’, Heng Li*, Swapan Mallick*:+, Michael Dannemann!, Qiaomei Ful®, 
Martin Kircher°, Martin Kuhlwilm', Michael Lachmann, Matthias Meyer’, Matthias Ongyerth!, Michael Siebauer’, 

Christoph Theunert!, Arti Tandon?*, Priya Moorjani*, Joseph Pickrell*, James C. Mullikin’, Samuel H. Vohr’, Richard E. Green’, 
Ines Hellmann’}, Philip L. F. J ohnson!®, Héléne Blanche!!, Howard Cann", Jacob O. Kitzman’, J ay Shendure’, Evan E. Eichler®'”, 
Ed S. Lein’’, Trygve E. Bakken", Liubov V. Golovanova’, Vladimir B. Doronichev*, Michael V. Shunkov"’, 

Anatoli P. Derevianko'®, Bence Viola'®, Montgomery Slatkin”, David Reich*?*"”, Janet Kelso! & Svante Piiibo! 


We present a high-quality genome sequence of a Neanderthal woman from Siberia. We show that her parents were 
related at the level of half-siblings and that mating among close relatives was common among her recent ancestors. We 
also sequenced the genome of a Neanderthal from the Caucasus to low coverage. An analysis of the relationships and 
population history of available archaic genomes and 25 present-day human genomes shows that several gene flow events 
occurred among Neanderthals, Denisovans and early modern humans, possibly including gene flow into Denisovans 
from an unknown archaic group. Thus, interbreeding, albeit of low magnitude, occurred among many hominin groups 
in the Late Pleistocene. In addition, the high-quality Neanderthal genome allows us to establish a definitive list of 
substitutions that became fixed in modern humans after their separation from the ancestors of Neanderthals and 


Denisovans. 


In 2008, a hominin finger phalanx was discovered during excavation 
in the east gallery of Denisova Cave in the Altai Mountains. From this 
bone, a genome sequence was determined to ~30-fold coverage’. 
Analysis showed that it came from a previously unknown group of 
archaic humans related to Neanderthals which we named ‘Denisovans”. 
Thus, at least two distinct human groups, Neanderthals and the related 
Denisovans, inhabited Eurasia when anatomically modern humans 
emerged from Africa. In 2010, another hominin bone, this time a prox- 
imal toe phalanx (Fig. 1a), was recovered in the east gallery of Denisova 
Cave’*. Layer 11, where both the finger and the toe phalanx were found, 
is thought to be at least 50,000 years old. The finger was found in sublayer 
11.2, which has an absolute date of 50,300 + 2,200 years (OxA-V-2359-16), 
whereas the toe derives from the lowest sublayer 11.4, and may thus be 
older than the finger (Supplementary Information sections 1 and 2a). 
The phalanx comes from the fourth or the fifth toe of an adult indi- 
vidual and its morphological traits link it with both Neanderthals and 
modern humans’. 


Genome sequencing 


In initial experiments to determine if DNA was preserved in the toe 
phalanx, we extracted and sequenced random DNA fragments. This 
revealed that about 70% of the DNA fragments present in the specimen 
aligned to the human genome. Initial inspection of the fragments with 
similarity to the mitochondrial (mt) genome suggested that its mt DNA 
was closely related to Neanderthal mtDNAs. We therefore assembled the 


full mitochondrial sequence by aligning DNA fragments to a complete 
Neanderthal mitochondrial genome* (Supplementary Information sec- 
tion 2b). A phylogenetic tree (Fig. 2a) shows that the toe phalanx mtDNA 
shares a common ancestor with six previously published Neanderthal 
mtDNAs’ to the exclusion of present-day humans and the Denisova 
finger phalanx. Among Neanderthal mtDNAs, the toe mtDNA is most 
closely related to the mtDNA from infant 1 from Mezmaiskaya Cave in 
the Caucasus’. 


Figure 1 | Toe phalanx and location of Neanderthal samples for which 
genome-wide data are available. a, The toe phalanx found in the east gallery of 
Denisova Cave in 2010. Dorsal view (left image), left view (right image). Total 
length of the bone is 26 mm. b, Map of Eurasia showing the location of Vindija 
Cave, Mezmaiskaya Cave and Denisova Cave, where Neanderthal samples used 
here were found. 
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Figure 2 | Phylogenetic relationships of the Altai Neanderthal. a, Bayesian 
tree of mitochondrial sequences of the toe phalanx, the Denisovan finger 
phalanx, six Neanderthals and five present-day humans. Posterior probabilities 
are given for branches whose support is less than one (Supplementary 
Information section 2b). b, Neighbour-joining tree based on autosomal 
transversion differences among the toe phalanx, four Neanderthals, the 
Denisova genome and seven present-day human individuals. Bootstrap values 
are shown for branches supported by less than 100% of 1,000 bootstrap 
replicates (Supplementary Information section 6). 


We generated four DNA libraries using a recently published protocol 
that is particularly efficient in retrieving DNA from ancient samples’”. 
These libraries, together with one library prepared using a previous 
protocol*, were treated with uracil-DNA glycosylase to remove uracil 
residues, acommon miscoding lesion in ancient DNA that results from 
the deamination of cytosine’ "' (Supplementary Information section 5a). 
In total, these five DNA libraries provided 52-fold sequence coverage of 
the genome. We estimated present-day human DNA contamination 
in the libraries with four complementary approaches (Supplementary 
Information section 5) using mtDNA and nuclear DNA and conclude 
that present-day human contamination among the DNA fragments 
sequenced is around 1%. After genotype calling, which is designed to be 
insensitive to low levels of error, we expect that the inferred genome 
sequence is largely free from contamination. 


Relationship to other hominins 


We compared the toe phalanx genome to the Denisovan genome’, the 
draft Neanderthal genome of 1.3-fold coverage determined from three 
individuals from Vindija Cave, Croatia'’, the genome of a Neanderthal 
infant estimated to be 60,000 to 70,000-years-old’* from Mezmaiskaya 
Cave in the Caucasus that we sequenced to 0.5-fold genomic coverage 
(Supplementary Information section 1; Fig. 1b) as well as 25 genomes 
of present-day humans: 11 previously sequenced to between 24- and 
31-fold coverage’ (Panel A), and 14 sequenced to between 35- and 42- 
fold coverage for this study (Panel B). We used pooled fosmid sequen- 
cing to resolve the sequences of the two chromosomes carried by 13 of 
these individuals’* (Supplementary Information section 4). 


Table 1 | Dating for branch shortening and population splits 


A neighbour-joining tree (Fig. 2b) based on transversions, that is, 
purine-pyrimidine differences, among 7 present-day humans and the 
low-coverage Mezmaiskaya and Vindija genomes corrected for errors 
(Supplementary Information section 6a), shows that the toe phalanx 
nuclear genome forms a clade with the genomes of Neanderthals. The 
average DNA sequence divergence between the toe phalanx genome 
and the Mezmaiskaya and Vindija Neanderthal genomes is approxi- 
mately a third of that between the Neanderthal and Denisova genomes. 
We conclude that the individual from whom the toe phalanx derives is 
a Neanderthal. Hereafter we refer to it as the ‘Altai Neanderthal’. 


Branch shortening 

The length of the branches leading from the common ancestor shared 
with chimpanzee to the high-coverage Altai Neanderthal and Denisovan 
genomes are 1.02% (range of point estimates: 0.99-1.05%) and 0.81% 
(range: 0.77-0.84%) shorter, respectively, than the branches to the present- 
day human genomes (Table 1; Supplementary Information section 6b). 
This is expected because the archaic genomes ceased accumulating 
substitutions at the death of the individual tens of thousands of years 
ago. We previously estimated the shortening of the Denisovan lineage 
to be 1.16% (range: 1.13-1.27%)'. The fact that using present-day human 
genomes of higher quality and more stringent quality filtering reduces 
the estimate by about a third shows that at present, estimates of lineage 
lengths are unstable, probably owing to differences in the error rates 
among the genomes used. Nevertheless, the fact that the Neanderthal 
lineage is about 20% shorter than the Denisovan lineage suggests that 
the Neanderthal toe phalanx is older than the Denisovan finger phal- 
anx, consistent with the stratigraphy of the cave. 


Population split times 

Figure 2b reflects the average divergence between DNA sequences. 
The times when the ancestral populations of archaic and modern 
humans separated are by necessity earlier. We used two approaches 
to estimate these population split times (Supplementary Information 
section 12). We caution that for these and other age estimates we rely 
on dates for the divergence of human and chimpanzee DNA sequences 
that in turn depend on the human mutation rate, which is currently 
controversial. In the text we present estimates based on a mutation rate 
of 0.5 X 10 ” base pairs per year, estimated from comparisons of the 
genomes of parents and children’*”’. In Table 1 we also present estimates 
based on a rate of 1.0 X 10°” base pairs per year derived from the fossil 
record which was used in previous studies of archaic genomes’*’*. We 
also caution that the split times are at the best approximate because the 
models of population history used are likely to be inaccurate. 

We first estimated population split times by extending the pairwise 
sequentially Markovian coalescent model (PSMC) to estimate the 
distribution of coalescence times between two single chromosomes 
that come from different populations”! (Supplementary Information 
section 12). Using Sub-Saharan African genomes that were experimentally 


Event As % of human-chimp 


Absolute date calibration number 1 


Absolute date calibration number 2 in Supplementary 


divergence in kyr (u= 1 X 107° per bp per year) kyr (u= 0.5 X 107° per bp per year) Information section 
Altai Neanderthal branch shortening 0.99-1.05 64-68 129-136 6b 
Denisova branch shortening 0.77-0.84 50-54 100-109 6b 
San-West African split 0.66-1.00 43-65 86-130 12 
Introgressing Neanderthal-Altai split 0.58-0.88 38-57 77-114 i3 
Introgressing Denisovan—Denisovan split 2.12-3.10 138-202 276-403 13 
Neanderthal-Denisova split* 2.93-3.64 190-236 381-473 12 
Archaic—African split* 4.23-5.89 275-383 550-765 12 
Unknown archaic split 7.90-31.12 450-2027 900-4054 16a, b 


This table gives date ranges for two calibrations. The firstassumes human-chimpanzee divergence of 6.5 million years and 1.30% for human-chimp divergence, or a mutation rate of 1 x 10-° bp per year!!2. The 


second is based on direct measurement of per generation mutation rates!*1”, 


, corresponding to a mutation rate of 0.5 x 10° ° per bp per year or 13 million years ago for human-chimpanzee divergence, and may 


fit better with some aspects of the fossil record*®*®, Intervals give the range of values over tested human genomes for branch shortening; lowest and highest estimate for two or three methods for San-West African, 
Neanderthal-Denisova, Neanderthal-African and Denisova-African split; jackknife confidence interval over introgressed chunks for the Introgressing-Archaic—Archaic splits; and a union of the jackknife 
confidence interval in Supplementary Information section 16a and the highest posterior mode in Supplementary Information section 16b for the unknown archaic split. 

*The indicated values are corrected for branch shortening where relevant as described in the Supplementary Information. 
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phased (Supplementary Information section 14) and segments of the 
archaic genomes in which the two chromosomes within an individual 
are closely related, we estimate the population split time between mod- 
ern humans on the one hand, and Neanderthals and Denisovans on the 
other, to between 553,000 and 589,000 years ago, and the split time 
between Neanderthals and Denisovans to 381,000 years ago (Sup- 
plementary Information section 12). 

Ina second approach we counted how often randomly chosen alleles 
in an individual from one population are derived (that is, different from 
the apes) at positions where both the derived and ancestral alleles are 
seen in an individual from a second population”. Such derived alleles 
will be less frequent the older the population separation time is because 
more derived alleles in the second population will then be because of 
mutations that occurred after the split. Using this approach and the 
demographic history inferred from the PSMC (Supplementary Infor- 
mation section 12), we estimate the population split of Neanderthals and 
Denisovans from modern humans to 550,000-765,000 years ago, and the 
split time of Neanderthals and Denisovans to 445,000-473,000 years ago. 


Inbreeding 

We noticed that the Altai Neanderthal genome contains several long 
runs of homozygosity, indicating that her parents were closely related 
(Fig. 3a). To estimate the extent of their relatedness, we scanned the 
genome for 1 Mb regions where most non-overlapping 50-kb windows 
were devoid of heterozygous sites and merged adjacent regions (Sup- 
plementary Information section 10). The Neanderthal genome has 20 
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Figure 3 | Indications of inbreeding in the Altai Neanderthal individual. 
a, Time since the most recent common ancestor in log-scale for the two 
alleles of a French, the Denisovan and the Altai Neanderthal individual 
(Supplementary Information section 12) along 40 Mb of chromosome 21. 

b, Pedigrees showing four possible scenarios of parental relatedness for the 
Altai Neanderthal (that is, the child at the bottom of each pedigree). Two 
additional scenarios can be derived by switching the sex of the parents for 
the panels marked with an asterisk. c, Fraction of the genome in runs of 
homozygosity between 2.5 and 10 cM in length for Altai Neanderthal, 
Denisovan and the three present-day human individuals with the largest 
fractions (grey bars). The fractions for the Altai Neanderthal (bottom four bars) 
are reduced by the fraction expected from the four inbreeding scenarios in 

b. Error bars represent the full range of values obtained from 700 simulations 
for each scenario. 
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such regions longer than 10 cM, whereas the Denisovan genome has 
one. We performed simulations of inbreeding scenarios that can result 
in regions of this number and length, and find that the inbreeding 
coefficient is 1/8, indicating that the parents were as closely related 
as half-siblings. As the Altai individuals is a female (Supplementary 
Information section 5) and the X chromosome also has long runs of 
homozygosity, we can exclude parental relationships in which none or 
only one of the two X chromosomes was inherited from closely-related 
common ancestor(s), that is, scenarios that include two successive 
males in the pedigree. We conclude that the parents of this Neander- 
thal individual were either half-siblings who had a mother in common, 
double first cousins, an uncle and a niece, an aunt and a nephew, a 
grandfather and a granddaughter, or a grandmother and a grandson 
(Fig. 3b). 

To investigate whether mating between closely related individuals 
may have been typical of the Altai Neanderthal population, we exam- 
ined the distribution of runs of homozygosity between 2.5 and 10 cM 
in length. After removing the runs expected from recent inbreeding, 
the Altai Neanderthal genome still contains more runs than the Denisovan 
genome (P< 2.2 X 10 '°), and both archaic genomes contain more 
than the Karitiana, a present-day population known to have a small 
effective size” (Fig. 3c; Supplementary Information 10). The sequen- 
cing of additional Neanderthal genomes to high quality will address 
whether breeding among close relatives was common also among 
Neanderthals in other geographic areas. 


Heterozygosity and population size 


The Neanderthal autosomal genome carries 1.7—1.8 heterozygous sites 
per 10,000 bp (Supplementary Information section 9). This is 84% of 
the number of heterozygous sites in the Denisovan genome, 22-30% of 
that in present-day non-African genomes, and 16-18% of that in pre- 
sent-day African genomes (Extended Data Fig. 1). When regions of 
homozygosity longer than 2.5cM stemming from recent as well as 
long-term inbreeding in the Neanderthal are removed, 2.1-2.2 sites 
per 10,000 are heterozygous, similar to what is observed in the Denisovan 
genome. Thus, heterozygosity in Neanderthals as well as Denisovans 
appears to have been lower than in present-day humans and is among 
the lowest measured for any organism”’. 

The demographic history of the population can be reconstructed 
from the distribution of the times since the most recent common 
ancestor of the two copies of the genome that a single person carries. 
We use the PSMC” to infer changes in the size of the Neanderthal 
population over time and compare this to inferences from the Denisovan 
and present-day human genomes (Fig. 4) (Supplementary Information 
section 12). All genomes analysed show evidence of a reduction in 
population size that occurred sometime before 1.0 million years ago. 
Subsequently, the population ancestral to present-day humans increased 
in size, whereas the Altai and Denisovan ancestral populations decreased 
further in size. It is thus clear that the demographic histories of both 
archaic populations differ substantially from that of present-day humans. 


Neanderthal gene flow into modern humans 


We have previously shown that Neanderthals contributed parts of their 
genomes to present-day populations outside Africa’” and that Denisovans 
contributed to the genomes of present-day populations in Oceania*”* 
(used here to refer to Australia, Melanesia and the Philippines). Using 
the high-coverage Neanderthal genome in conjunction with the two 
other Neanderthal genomes, we now estimate that the proportion of 
Neanderthal-derived DNA in people outside Africa is 1.5-2.1% (Sup- 
plementary Information 14; Extended Data Table 1). Second, we find 
that the Neanderthal-derived DNA in all non-Africans is more closely 
related to the Mezmaiskaya Neanderthal from the Caucasus than it is 
to either the Neanderthal from Siberia (Extended Data Table 2; Sup- 
plementary Information section 14) (Z-score range: 4.0-6.4) or to the 
Vindija Neanderthals from Croatia’* (Z-score range: 1.7-3.9). These 
results cannot be explained by present-day human contamination in 
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Figure 4 | Inference of population size change over time. The y axis specifies 
a number proportional to the population size N.. The x axis specifies time in 
units of divergence per base pair (along the top in years for mutation rates of 
0.5 X 10 to 1.0 X 10” per site per year). The analysis assumes that the 
Neanderthal and Denisova remains are of the same age, whereas archaeological 
evidence and the branch shortening indicate that the Neanderthal bone is 
older than the Denisovan bone. However, because the exact difference in 
ages is not known, it is not possible to determine whether the reduction 

in population size experienced by both archaic groups (but not by modern 
humans) coincided in time. 


the Mezmaiskaya Neanderthal data, as a contamination level on the 
order of 2.0-5.4% would be needed to account for the excess related- 
ness to the Mezmaiskaya Neanderthal whereas the contamination in 
the Mezmaiskaya data are estimated to be 0-1.1% (Supplementary 
Information 5a). 


Denisovan gene flow in mainland Asia 


We used the two high-coverage archaic genomes and a hidden Markov 
model (HMM) to identify regions of specifically Neanderthal and speci- 
fically Denisovan ancestry in 13 experimentally phased present-day 
human genomes"* (Supplementary Information sections 4 and 13). In 
the Sardinian and French genomes from Europe we find genomic regions 
of Neanderthal origin and few or no regions of Denisovan origin. In 
contrast, in the Han Chinese, the Dai in southern China, and the Karitiana 
and Mixe in the Americas, we find, in addition to regions of Neanderthal 
origin, regions that are consistent with being of Denisovan origin (Z- 
score = 4.3 excess relative to the Europeans) (Supplementary Informa- 
tion section 13), in agreement with previous analysis based on low-coverage 
archaic genomes”. These regions are also more closely related to the 
Denisova genome than the few regions identified in Europeans (Sup- 
plementary Information section 13). We estimate that the Denisovan 
contribution to mainland Asian and Native American populations is 
~0.2% and thus about 25 times smaller than the Denisovan contri- 
bution to populations in Papua New Guinea and Australia. The failure 
to detect any larger Denisovan contribution in the genome of a 40,000- 
year-old modern human from the Beijing area*® suggests that any 
Denisovan contribution to modern humans in mainland Asia was always 
quantitatively small. In fact, we cannot, at the moment, exclude that the 
Denisovan contribution to people across mainland Asia is owing to 
gene flow from ancestors of present-day people in Oceania after they 
mixed with Denisovans. We also note that in addition to this Denisovan 
contribution, the genomes of the populations in Asia and America 
appear to contain more regions of Neanderthal origin than populations 
in Europe’”’ (Supplementary Information sections 13 and 14). 


Archaic population differentiation 


To estimate how closely related the archaic populations that contrib- 
uted DNA to present-day humans were to the archaic individuals from 
which high-coverage genomes have been determined, we compared 
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the regions of Neanderthal and Denisovan ancestry in present-day 
human genomes identified by an HMM to the sequenced archaic 
genomes (Supplementary Information section 13). We find that the 
DNA sequence divergence in the regions that are most similar between 
the Altai Neanderthal genome and the Neanderthals that contributed 
DNA to present-day Eurasians is ~1.35% of the human-chimpanzee 
divergence, whereas the regions with the smallest sequence divergence 
between the Denisovan genome and the Denisovans that contributed 
DNA to present-day Papuans and Australians is ~3.18%. Regions of 
similarly low divergence are also identified by a window-based com- 
parison (Fig. 5). 

We estimate the population split time between the introgressing 
Neanderthal and the Altai Neanderthal genome to 77,000-114,000 years 
ago, and the split time between the introgressing Denisovan and the 
Denisovan genome to 276,000-403,000 years ago (Supplementary Infor- 
mation section 13) (Table 1). This is consistent with the Denisovan 
population being larger, more diverse and/or more subdivided than 
Neanderthal populations, and with the idea that Denisovans may have 
populated a wide geographical area. It is also in agreement with the low 
diversity among Neanderthal nuclear’ and mitochondrial’ genomes. 


Neanderthal gene flow into Denisovans 


If gene flow occurred between Neanderthals and Denisovans, we would 
expect that regions of the genome where the divergence between Denisovan 
and Neanderthal haplotypes is low would carry many differences between 
the two haplotypes of the individual who harbours the introgressed 
genetic material. This is because this individual carries two haplotypes 
that have accumulated differences independently in the two popula- 
tions. In contrast, in the absence of gene flow, regions of low divergence 
between a Neanderthal and a Denisovan haplotype are not expected to 
have particularly elevated diversity (Supplementary Information sec- 
tion 15). 

We plotted the number of differences between the Neanderthal gen- 
ome and the closest inferred DNA sequences in the Denisovan genome 
against Denisovan heterozygosity (Fig. 6). We find that Denisovan 
heterozygosity is increased in regions where the Neanderthal and one 
Denisovan allele are close, indicating that gene flow from Neanderthals 
into Denisovans occurred, and estimate that a minimum of 0.5% of the 
Denisovan genome was contributed by Neanderthals. The Denisovan 
genome shares more derived alleles with the Altai Neanderthal genome 
than with the Croatian or Caucasus Neanderthal genomes (Z-score 
range: 5.6-10.2) (Extended Data Table 2; Supplementary Information 
section 15), suggesting that the gene flow into Denisovans came froma 
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Figure 5 | Relatedness of introgressing archaic and sequenced archaic 
samples. Divergence of phased present-day human genomes to archaic 
genomes in windows of size 0.01 cM with a minimum of 25,000 analysed bases. 
Windows are sorted by sequence divergence measured on the archaic side of the 
tree (Supplementary Information section 13) and the y axis reports the 
divergence relative to human-chimpanzee divergence for cumulative fractions 
of the sorted windows over the entire genomes. Regions of low divergence 
between non-Africans and Neanderthals (a) and between Oceanians and 
Denisovans (b) indicate gene flow between these groups and the relative 
divergences between the introgressing archaic and sequenced archaic samples. 
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Figure 6 | Neanderthal gene flow into Siberian Denisovans. Divergence in 
0.01 cM sized windows with at least 50 kb analysed bases between a ‘test’- 
archaic genome and effectively haploid regions of the other archaic genome 
plotted against the most recent common ancestor of the two alleles of the ‘test’- 
archaic. The plot shows 50 equally sized bins of windows for the ‘test’ 
Denisovan against the effectively haploid Neanderthal (red) and for the ‘test’- 
archaic Altai Neanderthal against the effectively haploid Denisovan (blue). 
Divergence is given as percentage of human-chimpanzee divergence. Windows 
that show a close relationship between the effective haploid Altai Neanderthal 
and the closest inferred Denisovan haplotype show a deep divergence to the 
second Denisovan haplotype, indicating gene flow from Neanderthal into 
Denisovan. 


Neanderthal population more related to the Altai Neanderthal than to 
the other two Neanderthals. In the reciprocal analysis, we find no 
corresponding increase in Neanderthal heterozygosity. 

Particularly strong signals of Neanderthal gene flow into Denisovans 
are found in the human leucocyte antigen (HLA) region and the CRISP 
gene cluster on chromosome 6 (Extended Data Fig. 2), where we find 
many segments for which one of the Denisova haplotypes and the Altai 
Neanderthal share a common ancestor within a few tens of thousands 
of years before the death of the Altai individual (Supplementary Informa- 
tion section 15). This suggests the possibility that introgressed Neanderthal 
alleles may have contributed to the Denisovan functional variation at 
the HLA and the CRISP cluster, which are involved in immunity and 
sperm function, respectively. This is interesting as it has been suggested 
that HLA alleles from Neanderthals and Denisovans have been of func- 
tional relevance in modern humans”. 


Unknown archaic gene flow into Denisovans 


As the ancestors of both Neanderthals and Denisovans left Africa before 
the emergence of modern humans, one might expect present-day Africans 
to share equal proportions of derived alleles with these two archaic 
groups. However, we find that African genomes share about 7% more 
derived alleles with the Neanderthal genome than with the Denisova 
genome (Z = 11.6 to 13.0; Extended Data Table 2; Supplementary 
Information section 16a) and that this is particularly the case for derived 
alleles that are fixed in Africans, of which 13-16% more are shared with 
the Neanderthal than with the Denisovan genome (Fig. 7). 

We tested three non-mutually exclusive scenarios that could explain 
these observations. First, gene flow from the ancestor of Neanderthals 
after the split from Denisovans into the ancestors of all present-day 
humans would result in more sharing of derived alleles between present- 
day Africans and Neanderthals. However, because gene flow contributes 
alleles at low frequency, the sharing of derived alleles with Neanderthals 
would grow weaker with higher African derived allele frequency (Sup- 
plementary Information section 16a), whereas we observe the opposite 
(Fig. 7). Second, gene flow from the ancestors of present-day humans 
to Neanderthals after their split from Denisovans would also result in 
more sharing of derived alleles. However, the amount of allele frequency 
change (genetic drift) that has occurred in present-day Africans since 
the split from Neanderthals is too small to explain the extent of sharing 
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Figure 7 | Altai and Denisovan allele sharing with Africans stratified by 
African allele frequency. The plot shows the D-statistic of the form D 
(Neanderthal, Denisova; Africa, chimpanzee) binned by derived allele count in 
10 deeply sequenced African genomes. Error bars represent + 1 standard error. 
High-frequency and fixed derived alleles in Africa are more often shared with 
the Neanderthal than with the Denisovan genome. 


of derived alleles fixed in Africans (Supplementary Information section 16a). 
Third, we considered a scenario where Denisovans received gene flow 
from a hominin whose ancestors diverged deeply from the lineage lead- 
ing to Neanderthals, Denisovans and present-day humans. We find that 
this scenario is consistent with the data, as also suggested by others”, 
and estimate that 2.7-5.8% (jackknife 95% confidence interval) of the 
Denisova genome comes from this putative archaic hominin which 
diverged from the other hominins 0.9-1.4 million years ago (Sup- 
plementary Information section 16a). An approximate Bayesian com- 
putation” again supports the third scenario (Supplementary Information 
section 16b) and estimates that 0.5-8% of the Denisovan genome 
comes from an unknown hominin which split from other hominins 
between 1.1 and 4 million years ago. 

We caution that these analyses make several simplifying assump- 
tions. Despite these limitations, we show that the Denisova genome 
harbours a component that derives from a population that lived before 
the separation of Neanderthals, Denisovans and modern humans. This 
component may be present due to gene flow, or to a more complex 
population history such as ancient population structure maintaining a 
larger proportion of ancestral alleles in the ancestors of Denisovans 
over hundreds of thousands of years. 

The putative admixture into Denisovans from an unknown archaic 
group raises the possibility that the apparent Denisovan contribution 
to the genomes of Papuans and Australians could originate from 
admixture with the same unknown archaic population instead of with 
Denisovans. However, we tested this hypothesis and found that the 
archaic component in the genomes of people in Papua New Guinea 
and Australia comes from a group related to the Denisovans and not 
from an unknown archaic hominin (Supplementary Information sec- 
tion 17). 


Copy number differences 


The high-quality archaic genomes allow us to identify genetic changes 
that may have been relevant for putative biological traits that set mod- 
ern humans apart from archaic humans. To identify genomic regions 
that have changed in copy number during hominin evolution, we used 
the variation of coverage along the two archaic genomes and 25 present- 
day human genomes (Supplementary Information section 8). We find 
three regions that have been duplicated only on the modern human 
lineage (Extended Data Table 3). One region overlaps BOLA2, which 
occurs as a single copy per haploid genome in the archaic genomes but 
has two to five copies in all but one of 675 present-day humans ana- 
lysed, and which is near a microdeletion associated with developmental 
delay, intellectual disability and autism*’. 


Catalogue of modern human changes 

We compiled a genome-wide catalogue of sites where all or nearly all 
of 1,094 present-day humans” carry the same nucleotide but differ 
from the Neanderthal, Denisovan and great ape genomes (Supplemen- 
tary Information section 18). In the regions of the genome to which 
short fragments can be mapped, there are 31,389 such single nucleotide 
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Figure 8 | A possible model of gene flow events in the Late Pleistocene. The 
direction and estimated magnitude of inferred gene flow events are shown. 
Branch lengths and timing of gene flows are not drawn to scale. The dashed line 
indicates that it is uncertain if Denisovan gene flow into modern humans in 
mainland Asia occurred directly or via Oceania. D.I. denotes the introgressing 
Denisovan, N.I. the introgressing Neanderthal. Note that the age of the archaic 
genomes precludes detection of gene-flow from modern humans into the 
archaic hominins. 


substitutions and 4,113 short insertions and deletions (indels) shared 
by all present-day humans analysed, and a further 105,757 substitu- 
tions and 3,900 indels shared by 90% of present-day humans. This list 
of simple DNA sequence changes that distinguish modern humans from 
our nearest extinct relatives is thus comparatively small. For example, it 
contains only 96 fixed amino acid substitutions in a total of 87 proteins 
and in the order of three thousand fixed changes that potentially influ- 
ence gene expression in present-day humans (Supplementary Infor- 
mation section 18). 

Because the manner in which modern and archaic humans may 
have differed in aspects of their cognition is particularly interesting, 
we focused on the expression in the developing human brain of tran- 
scripts encoding the 87 proteins with fixed amino acid changes (Sup- 
plementary Information section 20). In comparison to a control set of 
transcripts that carry 108 silent substitutions fixed in present-day 
humans, genes carrying fixed amino acid changes are more often expressed 
in the ventricular zone of the developing neocortex (P = 0.06, corrected 
for multiple testing). Out of the five genes which are expressed in the 
proliferative layers (ventricular and subventricular zones combined) 
during mid-fetal development (CASC5, KIF18A, TKTL1, SPAG5, VCAM1), 
three (CASC5, KIF18A, SPAGS) are associated with the kinetochore of 
the mitotic spindle. This may be relevant phenotypically as the ori- 
entation of the mitotic cleavage plane in neural precursor cells during 
cortex development is thought to influence the fate of the daughter cells 
and the number of neurons generated (for example, see ref. 33). Another 
of these five genes, VCAM1, is essential for maintenance of neural stem 
cells in the adult subventricular zone™. 

Another way to prioritize changes in the catalogue for functional 
studies is to identify those that show signs of having risen to high fre- 
quency rapidly as they may have been affected by positive selection. We 
implemented an HMM to scan the genome for regions where the Neander- 
thal and Denisovan genomes fall outside of the variation of present-day 
humans (Supplementary Information section 19a). We ranked these 
regions, which cover less than 100 Mb of the genome, according to gen- 
etic length, because regions that rose rapidly to fixation are expected to 
be longer as they have been less affected by recombination events. A set 
of 63 regions likely to have been affected by positive selection were 
identified (Supplementary Information Table $19a.3). They contain 
2,123 substitutions and 61 indels that are fixed or of high-frequency 
(>90%) in modern humans (Supplementary Information section 19b). 
They include, for example, the gene RB1CC1 (also called FIP200) which 
encodes a transcription factor which, like VCAM1, is essential for main- 
tenance of neuronal stem cells in the adult subventricular zone’. In 
present-day humans, but not Neanderthals and Denisovans, RB1CC1 
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carries a substitution inferred to change an amino acid in the encoded 
protein as well as a substitution that affects a conserved site in a motif 
that occurs across the genome*’. Functional investigations will be 
necessary to clarify whether these and other such changes affect any 
phenotypes in present-day humans. 


Discussion 


We present evidence for three to five cases of interbreeding among 
four distinct hominin populations (Fig. 8). Clearly the real population 
history is likely to have been even more complex. For example, most 
cases of gene flow are likely to have occurred intermittently, often in 
both directions and across a geographic range. Thus, combinations of 
gene flow among different groups and substructured populations may 
have yielded the patterns detected rather than the discrete events 
considered here. Nevertheless, our analyses show that hominin groups 
met and had offspring on many occasions in the Late Pleistocene, but 
that the extent of gene flow between the groups was generally low. 

We note that the observation that the Neanderthal DNA sequences 
in non-Africans share more derived alleles with the Neanderthal from 
the Caucasus than with Neanderthals from either Croatia or the Altai 
indicates that the archaic gene flow into non-Africans occurred at a 
time when Neanderthal populations had separated from each other. 
We also note that the introgressed Neanderthal DNA sequences sug- 
gest a population split from the Altai Neanderthal between 77,000 and 
114,000 years ago (Supplementary Information section 13), well after 
~230,000 years ago when Neanderthal features appear in the fossil 
record”. These and other results**” show that the allele sharing between 
Neanderthals and non-African populations is owing to recent admixture 
rather than ancient population subdivision, an alternative which we 
and others previously considered possible’*””. 

The evidence suggestive of gene flow into Denisovans from an unknown 
hominin is interesting. The estimated age of 0.9 to 4 million years for 
the population split of this unknown hominin from the modern human 
lineage is compatible with a model where this unknown hominin con- 
tributed its mtDNA to Denisovans since the Denisovan mtDNA diverged 
from the mtDNA of the other hominins about 0.7-1.3 million years 
ago*'. The estimated population split time is also compatible with the 
possibility that this unknown hominin was what is known from the 
fossil record as Homo erectus. This group started to spread out of Africa 
around 1.8 million years ago”, but Asian and African H. erectus popu- 
lations may have become finally separated only about one million years 
ago*’. However, further work is necessary to establish if and how this 
gene flow event occurred. 


METHODS SUMMARY 


Sequences were generated on the Illumina HiSeq 2500 and base-calling was carried 
out using Ibis**. For all present-day human samples there is informed consent 
consistent with their use for whole genome sequencing and dissemination of data. 
Reads were merged and adaptor trimmed as described’ and mapped to the human 
reference genome using BWA (version 0.5.10). Genotyping was carried out using 
GATK (version 1.3). We restricted analyses to regions of the genome that are non- 
repetitive (excluding tandem repeats), unique (requiring at least 50%, or all, over- 
lapping 35-mers covering a position to map uniquely, allowing for one mismatch), 
and fall within the central 95% of the coverage distribution corrected for GC bias 
(Supplementary Information section 5b). The Supplementary Information describes 
the details of data processing and other analyses. 


Online Content Any additional Methods, Extended Data display items and 
Source Data are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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Extended Data Figure 1 | Heterozygosity estimates for the Altai 
Neanderthal individual, the Denisovan individual, non-Africans and 
Africans. The bars for the latter two give the range of heterozygosity 
observed among 15 non-African and 10 African individuals, respectively 
(Supplementary Information section 9). 


©2013 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


0.1 
ary 
8 8 
5 = 0.01 
pare 
oC oO 
Sco 
<Lig 
: g 0.001 
8 5 ° 
oa 
2 
a 
0.0001 CRISP cluster 
0 20 40 60 80 
Chromosome 6 (in Mb) 
Extended Data Figure 2 | Neanderthal-introgressed loci in Denisova. as percentage of human-chimpanzee divergence and bars represent + 1 
Divergence of the Altai Neanderthal to the most closely related Denisovan standard error. 


haplotype in windows of at least 200 kb on chromosome 6. Divergence is given 
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Extended Data Table 1 | Neanderthal ancestry estimate 


Other Neanderthal = Mezmaiskaya Other Neanderthal = Vindija 


Panel A Panel B Panel A Panel B 


a Std. Err. a Std. Err. a@ Std. Err. a@ Std. Err. 


French 0.020 0.003 0.019 0.003 0.016 0.002 0.017 0.002 
Sardinian 0.019 0.002 0.017 0.003 0.018 0.002 0.018 0.002 


Han 0.022 0.003 0.018 0.003 0.023 0.002 0.019 0.002 
Dai 0.019 0.003 0.016 0.003 0.019 0.002 0.016 0.002 
Karitiana 0.020 0.003 0.019 0.003 0.018 0.002 0.019 0.002 


Mixe - - 0.018 0.003 - “ 0.017 0.002 


f,(Denisova, Altai, Africa, X) 
f,(Denisova, Altai, Africa, Other Neanderthal) 


Note: we estimate ancestry using the equation z= 
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Selected D-statistics supporting inferences about gene flows 


wn 


tatistic 


is) 


French, Dinka; Altai, Chimp) 


is) 


Han, Dinka; Altai, Chimp) 


is) 


Han, Papuan; Denisova, Chimp) 


is) 


Han, Australian; Denisova, Chimp) 


iw) 


Altai, Mezmaiskaya; French, Dinka) 


=] 


Altai, Vindija; French, Dinka) 


D(Altai, Mezmaiskaya; Denisova, Chimp) 


D(Altai, Vindija; Denisova, Chimp) 


D(Altai, Denisova; 12 Africans, Chimp) 


D(Altai, Denisova; 12 Africans Fixed, Chimp) 


-7.0% 
-7.7% 


-16.4% 


-7.0% 


13.2% 


7.9% 


7.0% 


13.4% 


Z 


9.2 


11.4 


-9.5 
-10.7 


-5.8 


5.9 


5.6 


11.6 


Interpretation 


SI 14: Neanderthals share more derived alleles 
with non-Africans than with Africans. 


SI 14: Denisovans share more derived alleles with 
Oceanian populations than with other non 
Africans. 


SI 14: The archaic material in non-Africans falls 
within late Neanderthal variation: Non-Africans 
share more alleles with some Neanderthals 
(Mezmaiskaya/Vindija) than others (Altai). 


SI 15: Gene flow between Altai related 
Neanderthals and Denisovans (Denisovans share 
more derived alleles with Altai than with 
Mezmaiskaya) 


SI 16: Unknown archaic gene flow into Denisova: 
Africans share more derived alleles with Altai than 
with Denisova, a signal that strengthens for fixed 
derived alleles 
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Extended Data Table 3 | Lineage-specific segmental duplications along each of the terminal branches and genes encompassed 


Genotypes 
Locus Length Lineage Genes ~ModernHumans.—-Denisova.—~—~ésAXItank’— «SS=CS*«<~“ Meza ~~ 
(median) skaya 
chr12:122079832-122087495 7663 Altai-Neanderthal ORAI1 2 2 4 3 
chr12:132295389-132391442 96053 Altai-Neanderthal MMP17,ULK1 2 2 4 2 
chr19:9284044-9291195 7151 Altai-Neanderthal 2 2 4 4 
chr20:281880-290717 8837 Altai-Neanderthal 2 2 10 9 
chr3:12639069-12641393 2324 Altai-Neanderthal RAF1 2 2 7 3 
chr6:95473793-95532866 59073 Altai-Neanderthal 2 2 3 2 
chr11:39901956-39909545 7589 Denisova 2 4 2 2 
chr1:161272681-161274838 2157 Denisova MPZ 2 4 2 2 
chr12:49894191-49897733 3542 Denisova SPATS2 2 4 2 2 
chr19:55302094-55315197 13103 Denisova KIR3DP1,KIR2DL4 2 4 2 2 
chr2:48781187-48787915 6728 Denisova 2 3 2 2 
chr4:68542692-68577288 34596 Denisova UBA6,LOC550112 2 3 2 2 
chr4:68579206-68581585 2379 Denisova LOC550112 2 3 2 2 
chr7:140872574-140879065 6491 Denisova LOC100131199 2 6 2 2 
chr1:108924526-108990191 65665 Modern Human 4 2 2 2 
chr16:30200098-30206185 6087 Modern Human CORO1A,LOC606724,BOLA2 6 2 2 2 
chr2:87417089-87420544 3455 Modern Human 4 2 2 2 
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A molecular marker of artemisinin- 
resistant Plasmodium falciparum malaria 


Frédéric Ariey'+, Benoit Witkowski®, Chanaki Amaratunga’, Johann Beghain'+, Anne-Claire Langlois’, Nimol Khim’, 
Saorin Kim?, Valentine Duru?, Christiane Bouchier®, Laurence Ma°, Pharath Lim?*°, Rithea Leang®, Socheat Duong®, 
Sokunthea Sreng®, Seila Suon®, Char Meng Chuor®, Denis Mey Bout’, Sandie Ménard?+, William O. Rogers’, Blaise Genton’®, 
Thierry Fandeur!?, Olivo Miotto!!?, Pascal Ringwald", Jacques Le Bras’, Antoine Berry*+, Jean-Christophe Barale’*+, 
Rick M. Fairhurst**, Francoise Benoit-Vical!®!”*, Odile Mercereau-Puijalon>** & Didier Ménard>* 


Plasmodium falciparum resistance to artemisinin derivatives in southeast Asia threatens malaria control and elimination 
activities worldwide. To monitor the spread of artemisinin resistance, a molecular marker is urgently needed. Here, 
using whole-genome sequencing of an artemisinin-resistant parasite line from Africa and clinical parasite isolates from 
Cambodia, we associate mutations in the PF3D7_1343700 kelch propeller domain (‘K13-propeller’) with artemisinin 
resistance in vitro and in vivo. Mutant K13-propeller alleles cluster in Cambodian provinces where resistance is 
prevalent, and the increasing frequency of a dominant mutant K13-propeller allele correlates with the recent spread 
of resistance in western Cambodia. Strong correlations between the presence of a mutant allele, in vitro parasite survival 
rates and in vivo parasite clearance rates indicate that K13-propeller mutations are important determinants of 
artemisinin resistance. K13-propeller polymorphism constitutes a useful molecular marker for large-scale surveillance 
efforts to contain artemisinin resistance in the Greater Mekong Subregion and prevent its global spread. 


The emergence of Plasmodium falciparum resistance to artemisinin 
derivatives (ART) in Cambodia threatens the world’s malaria control 
and elimination efforts’’. The risk of ART-resistant parasites spread- 
ing from western Cambodia to the Greater Mekong Subregion and to 
Africa, as happened previously with chloroquine- and sulphadoxine/ 
pyrimethamine-resistant parasites*°, is extremely worrisome. Clinical 
ART resistance is defined as a reduced parasite clearance rate’*"°, 
expressed as an increased parasite clearance half-life’’”’, or a persistence 
of microscopically detectable parasites on the third day of artemisinin- 
based combination therapy (ACT). The half-life parameter correlates 
strongly with results from the in vitro ring-stage survival assay (RSAo_3h) 
and results from the ex vivo RSA}, which measure the survival rate of 
young ring-stage parasites to a pharmacologically relevant exposure 
(700 nM for 6h) to dihydroartemisinin (DHA)—the major metabolite 
of all ARTs. However, the present lack of a molecular marker hampers 
focused containment of ART-resistant parasites in areas where they 
have been documented and hinders rapid detection of these parasites 
elsewhere, where ACT's remain the most affordable, effective antima- 
larials. To detect and monitor the spread of ART resistance, a molecu- 
lar marker for widespread use is needed. 

Recent genome-wide analyses of P. falciparum isolates have pro- 
vided evidence of recent positive selection in geographic areas of ART 
resistance”’*'°. Whereas parasite heritability of the clinical phenotype 


is above 50%, no reliable molecular marker has yet been identified. One 
possible explanation is that the parasite clearance half-life is not only 
determined by the intrinsic susceptibility of a parasite isolate to ART, 
but also by its developmental stage at the time of ART treatment and 
host-related parameters such as pharmacokinetics and immunity”. 
This issue was recently highlighted in patients presenting discordant 
data between parasite clearance half-life in vivo and RSAg_3y survival 
rate in vitro’*. Moreover, genome-wide association studies (GWAS) 
are confounded by uncertainties about parasite population structure. 
Recent evidence for several highly differentiated subpopulations of ART- 
resistant parasites in western Cambodia’ suggests that distinct emer- 
gence events might be occurring. An alternative strategy to discover a 
molecular marker is to analyse mutations acquired specifically by 
laboratory-adapted parasite clones selected to survive high doses of 
ART in vitro, and use this information to guide analysis of polymorph- 
ism in clinical parasite isolates from areas where ART resistance is well 
documented at both temporal and geographical levels. Here we used 
this strategy to explore the molecular signatures of clinical ART res- 
istance in Cambodia, where this phenotype was first reported’*. 


A candidate molecular marker of ART resistance 


The ART-resistant F32-ART5 parasite line was selected by culturing 
the ART-sensitive F32-Tanzania clone under a dose-escalating, 125-cycle 
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regimen of artemisinin for 5 years'*. Whole-genome sequences were 
obtained for both F32-ART5 and F32-TEM (its sibling clone cultured 
without artemisinin) at 460 and 500 average nucleotide coverage, 
respectively. Compared to F32-TEM, no deleted genes were identified 
in F32-ART5. The exomes of F32-ART5 and F32-TEM were compared 
after excluding (1) genes from highly variable, multi-gene families (var, 
rifin and stevor), (2) positions with coverage lower than 25% of the 
mean coverage of the parasite line, (3) single-nucleotide polymorph- 
isms (SNPs) found to be mixed in F32-ART5, given that acquired 
ART-resistance mutation(s) could be expected to be fixed in the sam- 
ple after 5 years of continuous pressure, (4) SNPs shared between F32- 
ART5 and the ART-sensitive 3D7 parasite strain and (5) synonymous 
SNPs (Extended Data Fig. 1). 

This analysis identified eight mutations in seven genes that were 
subsequently confirmed by Sanger sequencing of PCR products (Extended 
Data Table 1). Each gene harbours one mutant codon in F32-ART5 
compared to F32-TEM, F32-Tanzania or 3D7 (Extended Data Table 2). 
Information on the expression of the genes and the biological function 
of the proteins are listed in Extended Data Table 3. Only one of these 
genes, cysteine protease falcipain 2a (PF3D7_1115700), has previously 
been associated with in vitro responses to ART’’. To determine when 
each mutation arose in the F32-ART5 lineage, we analysed the whole- 
genome sequences of parasites at various drug-pressure cycles (Fig. 1). 
This analysis showed that the PF3D7_0110400 D56V and PF3D7_ 
1343700 M476] mutations were acquired first, during the steep increase 
of ART resistance, and remained stable thereafter. Importantly, the 
appearance of these two mutations is associated with an increase in 
the RSAo_3 4 survival rate, from less than 0.01% to 12.8%. Subsequent 
PCR analysis of the PF3D7_1343700 locus detected the M476I muta- 
tion after 30 drug-pressure cycles, consistent with the sharp increase in 
RSAo-_3h survival rate observed thereafter. The other SNPs appeared 
stepwise at later stages of selection: PF3D7_0213400 (68 cycles); PF3D7_ 
1115700 (98 cycles); PF3D7_1302100, PF3D7_1459600 and PF3D7_ 
1464500 (120 cycles) (Extended Data Table 2). These data indicate that 
the PF3D7_1343700 M476] mutation increased the resistance of F32- 
Tanzania to DHA in the RSAo_3p. 

To explore whether these mutations are associated with ART resis- 
tance in Cambodia, we investigated sequence polymorphism in all seven 
genes by mining whole-genome or Sanger sequences for 49 culture- 
adapted parasite isolates collected in 2010-2011 (see Methods). We chose 
these isolates based on their differential RSAo_3) survival rates (Sup- 
plementary Table 1) and their sequences were compared to those of 
control parasites lines 3D7, 89F5”° and K1992 (see Methods). Three 
genes (PF3D7_0110400, PF3D7_0213400 and PF3D7_1302100) encode 
a wild-type sequence for all parasite isolates. The other four genes show 
intra-population diversity, with previously reported or novel SNPs 
(Supplementary Table 1). PF3D7_1115700 has 11 SNPs that are not 
associated with RSAg_3), survival rates (P = 0.06, Kruskal-Wallis test). 
PF3D7_1459600 has 6 SNPs that are not associated with survival rates 
(P = 0.65). PF3D7_1464500 has 12 SNPs previously reported in older 
isolates from southeast Asia, including the ART-susceptible Dd2 line”!, 
probably reflecting a geographic signature. These SNPs also show no 
significant association with survival rates (P = 0.42). Therefore, these 
six genes were not studied further. 

In contrast, PF3D7_1343700 polymorphism shows a significant 
association with RSAo_3}, survival rates (Fig. 2). Indeed, RSAp_3}, survival 
rates differ substantially between parasite isolates with wild-type (median 
0.17%, range 0.06-0.51%, n = 16) or mutant (18.8%, 3.8-58%, n = 33) 
K13-propeller alleles (P< 10° *, Mann-Whitney U test) (Supplementary 
Table 1). Four mutant alleles are observed, each harbouring a single 
non-synonymous SNP within a kelch repeat of the C-terminal K13- 
propeller domain, namely Y493H, R539T, 1543T and C580Y located 
within repeats no. 2, 3, 3 and 4, respectively. Both the K1992 and the 
ART-susceptible 89F5 lines carry a wild-type K13-propeller. There are 
no associations between polymorphisms in the K13-propeller and those 
in the other candidate genes (Supplementary Table 1). Based on these 
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Figure 1 | Temporal acquisition of mutations in F32-ART5. F32-Tanzania 
parasites exposed to increasing artemisinin concentrations for 120 consecutive 
cycles'* were analysed by whole-genome sequencing at five time-points 

(red arrows). Loci mutated after a given number of drug-pressure cycles 

are shown (red boxes). The earliest time-points where three mutations were 
detected by PCR (black arrows) are indicated by + for PF3D7_1343700, * for 
PF3D7_0213400 and { for PF3D7_1115700. Orange and green circles indicate 
RSAo-3h survival rates for F32-ART5 and F32-TEM parasites, respectively 
(mean of 3 experiments each). 


observations and the acquisition of M476] in kelch repeat no. 2 by 
F32-ART5, we investigated whether K13-propeller polymorphism is a 
molecular signature of ART resistance in Cambodia. 


Emergence and spread of K13-propeller mutants in Cambodia 


Over the last decade, the prevalence of ART resistance has steadily 
increased in the western provinces of Cambodia, but not elsewhere in 
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Figure 2 | Survival rates of Cambodian parasite isolates in the RSAo_3h, 
stratified by K13-propeller allele. Genotypes were obtained by mining whole- 
genome sequence data (n = 21) or sequencing PCR products (nm = 28). Mutant 
parasites have significantly higher RSAp_3, survival rates than wild-type 
parasites: wild type (n = 17, median 0.16%, IQR 0.09-0.24, range 0.04-0.51); 
C580Y (n = 26, median 14.1%, IQR 11.3-19.6, range 3.8-27.3, P< 10° for 
wild type versus CS580Y, Mann-Whitney U test); R539T (n = 5, median 24.2%, 
IQR 12.6-29.5, range 5.8-31.3, P< 10° ° for wild type versus R539T); Y493H 
(51.4%); and 1543T (58.0%). The RSAp_3}, survival rate (0.04%) of control 3D7 
parasites is indicated by an asterisk. 
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the country’. To test whether the spatiotemporal distribution of K13- 
propeller mutations correlates with that of ART resistance, we sequenced 
the K13-propeller of archived parasite isolates from Cambodian patients 
with malaria in 2001-2012 (Extended Data Table 4). Data from six 
provinces were compared (n = 886): Pailin, Battambang and Pursat in 
the west where ART resistance is established’**”, Kratie in the south- 
east where ART resistance has increased in recent years*, and Preah 
Vihear in the north and Ratanakiri in the northeast where there was 
virtually no evidence of ART resistance during this time period’. This 
analysis reveals overall 17 mutant alleles, including three high-frequency 
(> 5%) alleles (C580Y, R539T and Y493H). The frequency of wild-type 
sequence decreased significantly over time in all three western pro- 
vinces, but not in Preah Vihear or Ratanakiri. The frequency of the 
C580Y allele increased significantly from 2001-2002 to 2011-2012 in 
Pailin and Battambang, indicating its rapid invasion of the population 
and near fixation in these areas (Fig. 3). 

To further investigate the geographic diversity of K13-propeller 
polymorphism in Cambodia, we extended our sequence analysis to 
include data from four additional provinces (n = 55, Kampong Som, 
Kampot, Mondulkiri and Oddar Meanchey) in 2011-2012 (Extended 
Data Table 4). Although a large number of mutations are observed (Sup- 
plementary Fig. 1 and Extended Data Table 5), the C580Y allele accounts 
for 85% (189/222) ofall mutant alleles observed in 2011-2012 (Extended 
Data Fig. 2). This mapping outlines the elevated frequency (74%, 222/ 
300) of parasites harbouring a single non-synonymous mutation in 
the K13-propeller and the geographic disparity of their distribution. 
Importantly, the frequency distribution of mutant alleles over the vari- 
ous provinces matches that of day 3 positivity in patients treated for 
malaria with an ACT (Spearman’s r = 0.99, 95% confidence interval 
0.96-0.99, P< 0.0001), considered a suggestive sign of clinical ART 
resistance (Extended Data Fig. 3). 
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Battambang 
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K13-propeller polymorphisms and clinical ART resistance 


To confirm that K13-propeller polymorphism is a molecular marker 
of clinical ART resistance, we first identified 163 patients from Pursat 
and Ratanakiri in whom we measured parasite clearance half-lives 
(range 1.58-11.53 h)° in 2009-2010 and for which parasites were prev- 
iously assigned to a KH subpopulation (KH1, KH2, KH3, KH4 or KHA) 
on the basis of ancestry analysis of whole-genome sequence data’. 
Thirteen patients with mixed genotypes (a wild-type and one or more 
mutant K13-propeller alleles) were excluded. Of the remaining 150 
patients, 72 carried parasites with a wild-type allele and the others carried 
parasites with only a single non-synonymous SNP in the K13-propeller: 
C580Y (n = 51), R539T (n = 6) and Y493H (n = 21) (Extended Data 
Table 6). The parasite clearance half-life in patients with wild-type para- 
sites is significantly shorter (median 3.30 h, interquartile range (IQR) 
2.59-3.95) than those with C580Y (7.19 h, 6.47-8.31, P< 10°, Mann- 
Whitney U test), R539T (6.64 h, 6.00-6.72, P< 10° *) or Y493H (6.28 h, 
5.37-7.14, P<10 °) parasites (Fig. 4a). Also, the parasite clearance 
half-life in patients carrying C580Y parasites is significantly longer than 
those with Y493H parasites (P = 0.007, Mann-Whitney U test). These 
data indicate that C580Y, R539T and Y493H identify slow-clearing 
parasites in malaria patients treated with ART. 

Because KH2, KH3, KH4 and KHA parasites have longer half-lives 
than KH1 parasites'’, we proposed that allelic variation in the K13- 
propeller accounts for these differences. Among 150 parasites, 55, 26, 
14, 12 and 43 are classified as KH1, KH2, KH3, KH4 and KHA, respectively. 
Three K13-propeller alleles strongly associate with KH groups: 96% 
(53/55) of KH1, 96% (25/26) of KH2 and 100% (12/12) of KH4 para- 
sites carry the wild-type, CS580Y and Y493H alleles, respectively (Extended 
Data Table 6). Whereas KH3 parasites (n = 14) carry the wild-type, 
C580Y and R539T alleles, R539T is not observed in KH1, KH2 or KH4 
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Figure 3 | Frequency of K13-propeller alleles in 886 parasite isolates in 
six Cambodian provinces in 2001-2012. Genotypes were obtained by 
sequencing PCR products from archived blood samples. All mutant alleles 
carry a single non-synonymous SNP (colour-coded, same colour codes as in 
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Fig. 2 for wild type, C580Y, R539T, Y493H and 1543T). Significant reductions 
(Fisher’s exact test) in wild-type allele frequencies were observed in Pailin, 
Battambang, Pursat and Kratie over time (see Methods). 
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Figure 4 | Parasite clearance half-lives. a, Correlation of parasite clearance 
half-lives and K13-propeller alleles for parasite isolates in Pursat and Ratanakiri 
in 2009-2010. Wild-type parasites have shorter half-lives (median 3.30 h, IQR 
2.59-3.95, n = 72) than C580Y (7.19 h, 6.47-8.31, n = 51, P< 10 °, Mann- 
Whitney U test), R539T (6.64 h, 6.00-6.72, n = 6, P< 10°) or Y493H (6.28 h, 
5.37-7.14, n= 21, P< 10°) parasites. The half-life of C580Y parasites is 
significantly longer than that of Y493H parasites (P = 0.007). b, Correlation of 
parasite clearance half-lives, KH subpopulations’* and K13-propeller alleles for 
the same 150 parasite isolates. Half-lives are shown for Pursat (squares) and 
Ratanakiri (triangles) parasites, stratified by KH group and K13-propeller allele 
(colour-coded as in a). Median half-lives stratified by K13-propeller allele are 
KH1: wild type (2.88) and Y493H (6.77); KH2: C580Y (7.13) and Y493H (4.71); 
KH3: wild type (3.65), C580Y (8.73) and R539T (6.65); KH4: Y493H (6.37); and 
KHA: wild type (4.01), C580Y (7.09), Y493H (6.18) and R539T (5.73). 


parasites. As expected, KHA parasites have a mixed allele composition. 
Importantly, K13-propeller mutations more accurately identify slow- 
clearing parasites than KH group (Fig. 4b), demonstrating that the 
association of K13-propeller polymorphism with clinical ART resist- 
ance in Cambodia is partially independent of the genetic background 
of KH subpopulations. Within the KH1 group (n = 55), the parasite 
clearance half-life in patients with wild-type parasites is significantly 
shorter (n = 53, median 2.88 h, IQR 2.52-3.79) than those with Y493H 
parasites (n = 2, median 6.77 h, P = 0.02, Mann-Whitney U test). Within 
the KH3 subpopulation (n = 14), the half-life in patients with wild-type 
parasites is shorter (n = 3, median 3.65h) than those with C580Y 
(n= 7, median 8.73h, IQR 7.35-9.06, P=0.02) or R539T (n=4, 
6.65 h, 6.29-6.80, P = 0.03) parasites. 


Discussion 


The F32-ART5 lineage acquired a K13-propeller mutation as it developed 
ART resistance, as indicated by its ability to survive a pharmacologi- 
cally relevant exposure to DHA in the RSAp _3},. Genes putatively associated 
with ART resistance (Pfcrt”, Pftctp*°, Pfmdr1°”’”*, Pfmrp1’”-” and 
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ABC transporters”) or encoding putative targets of ART (PfATPaseo’'* 
and Pfubcth—the orthologue of Plasmodium chabaudi ubp1***) were 
not mutated during the 5-year selection of F32-ART5, and Pfmdr1 
amplification was not observed**’. In addition, all candidate ART- 
resistance genes recently identified using population genetics appro- 
aches'*“° remained unaltered in F32-ART5, except for PF3D7_1343700 
and PF3D7_1459600 located in the linkage-disequilibrium windows 
identified in ref. 16. These findings led us to identify another 17 single 
K13-propeller mutations in naturally circulating parasites in Cambodia. 
Several of these mutations associate strongly with the spatiotemporal 
distribution of ART resistance in Cambodia, increased parasite sur- 
vival rates in response to DHA in vitro, and long parasite clearance 
half-lives in response to ART treatment in vivo. None of the six other 
genes mutated in F32-ART5 associate with RSA 3}, survival rates in 
parasite isolates from Cambodia. 

K13-propeller polymorphism fulfils the definition of a molecular 
marker of ART resistance for several reasons: (1) there has been a pro- 
gressive loss of wild-type parasites in western Cambodia during the 
decade of emerging ART resistance in this region; (2) mutant para- 
sites cluster in Cambodian provinces where ART resistance is well 
established and are less prevalent where ART resistance is uncommon; 
(3) PF3D7_1343700 is located 5.9 kilobases upstream of the 35-kb locus 
identified in ref. 14 as being under recent positive selection, and within 
the region of top-ranked signatures of selection outlined in ref. 16; 
(4) multiple mutations, all non-synonymous, are present in the K13- 
propeller, reflecting positive selection rather than a hitchhiking effect 
or genetic drift; (5) mutations occur in a domain that is highly conserved 
in P. falciparum, with only one non-synonymous SNP being documen- 
ted in a single parasite isolate from Africa*’; (6) all polymorphisms 
we observe in Cambodia are novel and all but one (V568G) occur at 
positions strictly conserved between Plasmodium species (Supplemen- 
tary Fig. 1 and Supplementary Fig. 2), suggesting strong structural and 
functional constraints on the protein; (7) the three most-prevalent 
K13-propeller mutations correlate strongly with RSAo_3, survival 
rates in vitro and parasite clearance half-lives in vivo at the level of 
individual parasite isolates and malaria patients, respectively; and (8) the 
frequency of mutant alleles correlates strongly with the prevalence of 
day 3 positivity after ACT treatment at the level of human populations 
in Cambodia. 

On the basis of homology with other kelch propeller domains, we 
anticipate that the observed K13-propeller mutations destabilize the 
domain scaffold and alter its function. The carboxy-terminal portion 
of PF3D7_1343700 encodes six kelch motifs, which are found in a large 
number of proteins with diverse cellular functions**“*. Given that the 
toxicity of ART derivatives depends principally on their pro-oxidant 
activity, the reported role of some kelch-containing proteins in regu- 
lating cytoprotective and protein degradation responses to external 
stress is particularly intriguing. The K13-propeller shows homology 
with human KLHL12 and KLHL2, involved in ubiquitin-based protein 
degradation, and KEAP1, involved in cell adaptation to oxidative stress 
(Extended Data Fig. 4). More work is needed to delineate the normal 
function of K13 and the effect of various mutations. Allele exchange 
studies in mutant and wild-type parasites may help to define the con- 
tribution of K13-propeller polymorphisms on different genetic back- 
grounds to the RSAo-3», survival rate. Indeed, it is particularly worrying 
that as few as two mutations, that is, the K13-propeller M476I and 
PF3D7_0110400 D56V, were sufficient to confer ART resistance to F32- 
Tanzania, which has a typical African genetic background. Cambodian 
parasites with mutant K13-propellers display a wide range of RSAp_3), 
survival rates (3.8-58%) and parasite clearance half-lives (4.5-11.5 h). 
Further studies are therefore required to identify additional genetic 
determinants of ART resistance, which may reside in the strongly 
selected regions recently identified'*’’. In this context, analysing the 
RSAo_3p Survival rates as a quantitative trait among parasites harbour- 
ing the same K13-propeller mutation could help to identify additional 
genetic loci involved in ART resistance. 
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In summary, K13-propeller polymorphism seems to be a useful 
molecular marker for tracking the emergence and spread of ART- 
resistant P. falciparum. 


METHODS SUMMARY 


The ART-resistant F32-ART5 parasite line was selected by culturing the ART- 
sensitive F32-Tanzania clone under a dose-escalating regimen of artemisinin for 
5 years. The F32-TEM line was obtained by culturing F32-Tanzania in parallel 
without artemisinin exposure. Reference DNA was extracted from P. falciparum 
lines 3D7, 89F5 Palo Alto Uganda and K1992. The ring-stage survival assay (RSAp_3) 
was performed as described previously'’?. Whole-genome sequencing was per- 
formed on F32-Tanzania, F32-TEM, F32-ART5 (4 time points), three reference 
strains (3D7, 89F5 and K1992) and 21 Cambodian parasite isolates, using an 
Illumina paired-reads sequencing technology. A set of 1091 clinical P. falciparum 
isolates was collected from patients participating in ACT efficacy studies in 2001- 
2012. The K13-propeller was amplified using nested PCR. Double-strand sequen- 
cing of PCR products was performed by Macrogen. Sequences were analysed with 
MEGA 5 software version 5.10 to identify specific SNP combinations. Data were 
analysed with Microsoft Excel and MedCalc version 12. Differences were consid- 
ered statistically significant when P values were less than 0.05. Ethical clearances 
for parasite isolate collections were obtained from the Cambodian National Ethics 
Committee for Health Research, the Institutional Review Board of the Naval 
Medical Research Center, the Technical Review Group of the WHO Regional 
Office for the Western Pacific, and the Institutional Review Board of the National 
Institute of Allergy and Infectious Diseases. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 


Artemisinin- and mock-pressured P. falciparum F32 lineages. Mycoplasma- 
free F32-Tanzania parasites were maintained in human type O red blood cells 
(RBCs) (Etablisssement Frangais du Sang) diluted to 2.5% haematocrit in RPMI- 
1640 medium (Invitrogen, San Diego, CA) supplemented with 5% human serum. 
Parasite cultures were maintained at 37 °C in an atmosphere of 5% COs. Parasitaemia 
was checked daily and maintained below 10%. For the selection of ART-resistant 
parasites, asynchronous cultures were adjusted to 5-7% parasitaemia and grown in 
the presence of escalating doses of artemisinin (from 10 nM to 9 11M) for 24 h for 
the first 3 years of drug pressure'*. In the subsequent 2 years, each drug-pressure 
cycle was done for 48 h with doses ranging from 9 |M to 18 1M. After drug exposure, 
the medium was discarded and replaced by human-serum-supplemented (20%) 
drug-free medium. Parasitaemia was monitored daily until it reached 5%. At that 
time, drug pressure was reapplied. The parasite line obtained after an effective 
5 years of discontinuous ART pressure was named F32-ART5. In parallel, the 
parental F32-Tanzania line was kept as a control in continuous culture for the 
same time under the same conditions (that is, RBCs, serum and media) but without 
artemisinin exposure. The resulting control line was called F32-TEM. 
Laboratory-adapted P. falciparum lines. Reference DNA was extracted from the 
laboratory-adapted P. falciparum lines 3D7 (MR4, Manassas, VA), 89F5 Palo 
Alto Uganda (a clone from the Palo Alto line, originating from Uganda in 1978, 
which displayed high susceptibility to artemether treatment in the Saimiri sciureus 
experimental host (O. Mercereau-Puijalon, H. Contamin and J.-C. Barale, unpub- 
lished data)) and K1992, an isolate collected in Pailin in 1992 before the mass 
deployment of ART in that area (provided by the French National Reference 
Center of Malaria). Parasite DNA was extracted from frozen blood aliquots (200 kl) 
using the Mini blood kit (Qiagen) according to the manufacturer’s instructions. 
Culture-adapted P. falciparum isolates from Cambodia. Fifty clinical P. falci- 
parum isolates from Cambodia (collected in 2010 and 2011) were adapted to in vitro 
culture as described in ref. 45. Their geographic origin is indicated in Supplementary 
Table 1. Parasite clearance rates were not determined for these patient isolates, as 
they were collected during field trials that did not include such measurements. 
Parasite DNA was extracted from frozen blood aliquots (200 ul) using the Mini 
blood kit (Qiagen). 

Ring-stage survival assay. The ring-stage survival assay (RSAp_3) was carried 
out as described in ref. 13 using highly synchronous parasite cultures. In brief, 
0-3h post-invasion ring-stage parasites were exposed to 700nM DHA (dihy- 
droartemisinin, obtained from WWARN (http://www.wwarn.org/research/tools/ 
qaqc)) or its solvent DMSO for 6 h, washed and then cultivated for the next 66 h 
without drug. Survival rates were assessed microscopically by counting in Giemsa- 
stained thin smears the proportion of viable parasites that developed into second- 
generation rings or trophozoites with normal morphology. 

Ethical clearance. Ethical clearances for the collection of parasite isolates from 
patients were obtained from the Cambodian National Ethics Committee for 
Health Research, the Institutional Review Board of the Naval Medical Research 
Center, the Technical Review Group of the WHO Regional Office for the Western 
Pacific, and the Institutional Review Board of the National Institute of Allergy and 
Infectious Diseases. Work was conducted in compliance with all relevant ethical 
standards and regulations governing research involving human subjects. Written 
informed consent was obtained from all adult participants or the parents or 
guardians of children. 

Temporal and geographical sample collection. A set of 941 clinical P. falciparum 
isolates was collected from patients participating in therapeutic efficacy studies of 
ACTs, conducted as part of the routine antimalarial drug efficacy monitoring of 
Cambodia’s National Malaria Control Program from 2001 to 2012, and from 
studies conducted by NAMRU-2 (Extended Data Table 4). Venous blood samples 
(5 ml) collected in EDTA or ACD were transported to Institut Pasteur du Cambodge 
in Phnom Penh within 48 h of collection at 4°C and then kept at —20°C until 
DNA extraction. Parasite DNA was extracted from frozen blood aliquots (200 11) 
using the Mini blood kit (Qiagen). 

Measurement of parasite clearance half-life. Patients with uncomplicated or 
severe P. falciparum malaria and initial parasite density = 10,000 pl’ were enrolled 
in Pursat and Ratanakiri provinces in 2009 and 2010 as described®”’. Patients were 
treated with an ART and their parasite density measured every 6 h from thick 
blood films until parasitaemia was undetectable. The parasite clearance half-life in 
163 patients was derived from these parasite counts using WWARN’s on-line 
Parasite Clearance Estimator (http://www.wwarn.org/toolkit/data-management/ 
parasite-clearance-estimator). The study is registered at ClinicalTrials.gov (number 
NCT00341003). 

Whole-genome sequencing of parasite DNA. Whole-genome sequencing was 
performed on F32-Tanzania, F32-TEM, the F32-ART5 lineage (4 time-points), three 
reference strains (3D7, 89F5 and K1992) and 21 parasite isolates from Cambodia, 
using an Illumina paired-reads sequencing technology. Illumina library preparation 


and sequencing followed standard protocols developed by the supplier. Briefly, 
genomic DNA was sheared by nebulization, and sheared fragments were end- 
repaired and phosphorylated. Blunt-end fragments were A-tailed, and sequencing 
adapters were ligated to the fragments. Inserts were sized using Agencourt 
AMPure XP Beads (+ 500 bp; Beckman Coulter Genomics) and enriched using 
10 cycles of PCR before library quantification and validation. Hybridization of the 
library to the flow cell and bridge amplification was performed to generate clusters, 
and paired-end reads of 100 cycles were collected on a HiSeq 2000 instrument 
(Illumina). After sequencing was complete, image analysis, base calling and error 
estimation were performed using Illumina Analysis Pipeline version 1.7. 

Raw sequence files were filtered using Fqquality tool, a read-quality filtering 
software developed by N. Joly, which enables the trimming of the first and last 
low-quality bases in reads. The trimmed reads from controlled Fastq files were 
mapped on a reference genome (P. falciparum 3D7) with the Burrows-Wheeler 
Alignment (BWA), generating a BAM file (a binary file of tab-delimited format 
SAM). Next, we used Samtools to prepare a pileup file, which was formatted using 
in-house software to implement the data into the Wholegenome Data Manager 
(WDM) database (Beghain et al., in preparation). WDM software is designed to 
compare and/or align partial or whole P. falciparum genomes. 

Sequencing genes containing non-synonymous SNPs in F32-ART5. PCR amp- 
lification of selected genes was performed using the primers listed in Extended 
Data Table 1. Two pil of DNA was amplified with 1 1M of each primer, 0.2 mM 
dNTP (Solis Biodyne), 3 mM MgCl, and 2 U Taq DNA polymerase (Solis Biodyne), 
using the following cycling program: 5 min at 94 °C, then 40 cycles of 30s at 94 °C, 
90 s at 60°C, 90 s at 72 °C and final extension 10 min at 72 °C. PCR products were 
detected by 2% agarose gel electrophoresis and ethidium bromide staining. Double- 
strand sequencing of PCR products was performed by Beckman Coulter Genomics. 
Sequences were analysed with MEGA 5 software version 5.10 in order to identify 
specific SNP combinations. 

Sequencing the K13-propeller domain. The K13-propeller domain was amplified 
using the following primers: for the primary PCR (K13-1 5'-cggagtgaccaaatctggga-3' 
and K13-4 5'-gggaatctggtgetaacagc-3’) and the nested PCR (K13-2 5’-gccaagctg 
ccattcatttg -3’ and K13-3 5’-gccttgttgaaagaagcaga -3’). One pl of DNA was amp- 
lified with 1 [1M of each primer, 0.2 mM dNTP (Solis Biodyne), 3 mM MgCl, and 
2 U Taq DNA polymerase (Solis Biodyne), using the following cycling program: 
5 min at 94°C, then 40 cycles of 30 s at 94 °C, 90 s at 60 °C, 90 s at 72 °C and final 
extension 10 min at 72 °C. For the nested PCR, 2 ,l of primary PCR products were 
amplified under the same conditions, except for the MgCl, concentration (2.5 mM). 
PCR products were detected using 2% agarose gel electrophoresis and ethidium 
bromide staining. Double-strand sequencing of PCR products was performed by 
Macrogen. Sequences were analysed with MEGA 5 software version 5.10 to identify 
specific SNP combinations. 

Deep-sequencing of clinical parasite isolates and population structure ana- 
lysis. DNA extraction, Illumina sequencing and SNP genotyping of clinical para- 
site isolates obtained from malaria patients in Pursat and Ratanakiri provinces, 
Cambodia, have been previously described’*. Population structure analysis of these 
parasites identified four subpopulations: KH1, KH2, KH3 and KH4. Parasites with 
<80% ancestry from any of these four groups were deemed admixed (KHA). 
Temporal acquisition of mutations in the F32-ART5 lineage. The F32-ART5 
lineage was explored by whole-genome sequencing using samples collected at 
time 0 (original F32-Tanzania clonal line), day 196 (0.2-1.M pressure cycle no. 23), 
day 385 (1.8-M pressure cycle no. 39), day 618 (9-l.M pressure cycle no. 56) and 
day 2,250 (9-11M pressure cycle no. 120). The F32-TEM sample was collected 
on day 2,250. Additional samples collected at the time of the 30th, 33rd, 34th, 
36th, 68th and 98th pressure cycles were studied by PCR. DNA from parasite cultures 
was extracted using the High Pure PCR Template Preparation Kit (Roche Dia- 
gnostics) according to the manufacturer’s instructions. 

The F32-ART5 samples tested in the ring-stage survival assay (RSAp-3n) were 
collected at the time of the 17th, 48th and 122nd pressure cycles (0.04, 2.7 and 
9uM ART), respectively. The F32-TEM sample was collected at the last mock 
pressure cycle. The RSAo_3;, survival rates were determined in triplicate experi- 
ments with different batches of red blood cells, and evaluated as above using 
Giemsa-stained thin smears read by two independent microscopists (B.W. and 
F.B.-V.). Survival rates were compared using Mann-Whitney U test. The RSAo_3h 
survival rates of the F32-ART5 samples were as follows: at drug-pressure cycles: no. 
17 (n = 3, median 0%, IQR 0-0.07), no. 48 (n = 3, median 11.7%, IQR 10.3-14.6; 
P= 0.04 for no. 17 versus no. 48, Mann-Whitney U test) and no. 122 (n= 3, 
median 12.8%, IQR 10.6-14.5, P= 0.04 and P= 0.82 for no. 17 versus no. 122 
and no. 48 versus no. 122). The RSAp_3 survival rate of the F32-TEM line was also 
determined in triplicate experiments (n = 3, median 0%, IQR 0-0.05, P = 0.81 for 
TEM versus no. 17, Mann-Whitney U test). 

Prevalence of K13-propeller mutations in 886 clinical parasite isolates collected 
in six Cambodian provinces in 2001-2012. The K13-propeller was genotyped by 
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sequencing PCR products amplified from 886 archived blood samples. The num- 
ber of samples analysed from each province each year is indicated in Fig. 3. Fisher’s 
exact test was used to compare the frequency of isolates harbouring a wild-type K13- 
propeller sequence in each province over time. A significant decrease of the fre- 
quency of the wild-type K13-propeller allele was observed in the western provinces 
during the decade. In Pailin, it decreased from 30.0% in 2001-2002 (12/40) to 4.8% 
in 2011-2012 (4/84), P = 0.0002, in Battambang from 71.9% in 2001-2002 (46/64) 
to 7.0% in 2011-2012 (5/71), P< 10 ©, in Pursat from 50.0% in 2003-2004 (5/10) 
to 10.5% in 2011-2012 (2/19), P = 0.03; and in Kratie from 93.3% in 2001-2002 
(14/15) to 29.4% in 2011-2012 (5/17), P = 0.0003. Significant decreases in wild- 
type allele frequency were not observed in Preah Vihear (from 92.6% in 2001-2002 
(25/27) to 84.2% in 2011-2012 (16/19), P = 0.63); or Ratanakiri (from 96.4% in 
2001-2002 (54/56) to 94.3% in 2011-2012 (33/35), P = 0.63). The frequency of 
C580Y increased in Pailin from 45.0% (18/40) in 2001-2002 to 88.1% (74/84) in 
2011-2012 (P< 107°), and in Battambang from 7.8% (5/64) in 2001-2002 to 
87.3% (62/71) in 2011-2012 (P< 10 °) indicating its rapid invasion of the popu- 
lation and near fixation in these provinces. 

Three-dimensional structure modelling of the K13-propeller. The 3D-structural 
model of the kelch propeller domain of PF3D7_1343700 (‘K13-propeller’) was 
obtained by homology modelling satisfying spatial restraints using Modeller 
v9.11 (http://modbase.compbio.ucsf.edu). The 295 amino acids composing the 
K13-propeller are 22%, 25% and 25% identical to the kelch propeller domain of 
the human KEAP1 (Protein Data Bank (PDB; http://www.rcsb.org/) 2FLU), 
KLHL12 (PDB 2VPJ) and KLHL2 (PDB 2XN4) proteins, respectively, that were 
used as templates to model the 3D-structure of the K13-propeller. The reliability of 
the obtained model was assessed using two classical criteria. First, the significance 
of the sequence alignment between the K13-propeller domain and one template 
was confirmed by an E-value = 0, as calculated by Modeller using the Built-Profile 
routine. Second, the model achieved a GA341 model score = 1 (a score = 0.7 
corresponds to highly reliable models). Localization of the mutants in the K13- 
propeller 3D-model was prepared using the PYMOL Molecular Graphics System, 
version 1.5.0.4 (Schrédinger; http://www.pymol.org). 

Statistical analysis. Data were analysed with Microsoft Excel and MedCalc ver- 
sion 12. Quantitative data were expressed as median, interquartile range (IQR). 
The Mann-Whitney U test (independent samples, two-sided) was used to com- 
pare two groups, and the Kruskal-Wallis test (H-test, two-sided) was used to 
compare more than two groups. The Spearman’s rho rank correlation coefficient 
(and the 95% confidence interval for the correlation coefficient) was used to 
measure the strength of relationship between the prevalence of wild-type K13- 
propeller allele and the frequency of day 3 positivity (defined as persistence of 
microscopically detectable parasites on the third day of artemisinin-based com- 
bination therapy)’. Fisher’s exact test was used to compare frequency data and the 
Clopper-Pearson exact method based on the beta distribution was used to determine 
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95% confidence intervals for proportions. Differences were considered statistically 
significant when P values were less than 0.05. 
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Extended Data Figure 1 | SNP-calling algorithm and sequence and coverage 
of SNPs. a, SNP-calling algorithm of the whole-genome sequence comparison 


No Reads F32- ART5: 2x 69,513,592 
No Reads F32-TEM: 2x 77,619,877 


mean coverage: 460x 
mean coverage: 500x 


Differences between exomes of F32-ART5 and F32-TEM 
after excluding genes known to be highly variable (var, rifin and stevor multigene families) 


23582 SNPs 
Coverage of position > 25% mean coverage of isolate (ie >92 for F32-ART5 ; >100 for F32-TEM) 
4381 SNPs 
Exclude SNPs shared between F32-ART5 and 3D7 
1281 SNPs 
SNP with frequency > 80% of position coverage in F32-ART5 
59 SNPs 
exclude SNPs present in F32-ART5 and mixed in F32-TEM 
(42 genes) 
Synonymous SNPs 2 genes 
Non Synonymous SNPs 7 genes 
F32-ART5S 
No reads with % reads with 
Position Gene ID See mutant SNP mutant SNP need 
394452 PF3D7_0110400;ONA-directed RNA polymerase 2, 224 222 99.11 335 
putative 
542625 § PF3D7_0213400;protein kinase 7 (PK7) 242 242 100.00 403 
593379 PF3D7_1115700;cysteine proteinase falcipain 2a 234 231 98.72 290 
121689  PF3D7_1302100;gamete antigen 27/25 (Pf{g27) 261 259 99.23 343 
1725570 | PF3D7_1343700;kelch protein, putative 1004 1004 100.00 1161 
2442240 PF3D7_1459600;conserved Plasmodium protein, 165 142 86.06 225 
unknown 
2612177 PF3D7_1464500;conserved Plasmodium membrane 401 399 99.50 428 


protein, unknown 
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F32-TEM 


No reads with 
wt. SNP 


334 


403 


428 


% reads with 
w.t. SNP 
99.70 


100.00 
99.66 
99.71 
99.91 


100.00 


100.00 


of F32-ART5 and F32-TEM. b, Sequence and coverage of SNPs in seven 
candidate genes differing in F32-TEM and F32 ARTS. 
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Extended Data Figure 2 | Geographic distribution of K13-propeller alleles 
in Cambodia in 2011-2012. Pie charts show K13-propeller allele frequencies 
among 300 parasite isolates in ten Cambodian provinces. Pie sizes are 
proportional to the number of isolates and the different alleles are colour-coded 
as indicated. The frequencies (95% confidence interval) of mutant K13- 
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propeller alleles are: Pailin (95%, 88-99, n = 84), Battambang (93%, 87-99, 

n = 71), Pursat (89%, 67-99, n = 19), Kampot (83%, 52-98, n = 12), Kampong 
Som (71%, 29-96, n = 7), Oddar Meanchey (76%, 58-89, n = 33), Preah 
Vihear (16%, 3-40, n = 19), Kratie (71%, 44-90, n = 17), Mondulkiri (67%, 
9-99, n = 3) and Ratanakiri (6%, 1-19, n = 35). 
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Extended Data Figure 3 | Correlation between the frequency of wild-type 
K13-propeller alleles and the prevalence of day 3 positivity after ACT 
treatment in eight Cambodian provinces. The frequency of day 3 positivity is 
plotted against the frequency of wild-type K13-propeller alleles. Data are 
derived from patients treated with an ACT for P. falciparum malaria in 2010- 
2012 in eight Cambodian provinces (Extended Data Figure 2): Pailin (n = 86, 
2011 WHO therapeutic efficacy study, artesunate-mefloquine); Pursat (n = 32, 
2012 WHO therapeutic efficacy study, dihydroartemisinin-piperaquine); 
Oddar Meanchey (n = 32, 2010 NAMRU-2 therapeutic efficacy study, 


artesunate-mefloquine); Kampong Som/Speu (n = 7, 2012 WHO therapeutic 
efficacy study, dihydroartemisinin-piperaquine); Battambang (n = 18, 

2012 WHO therapeutic efficacy study, dihydroartemisinin-piperaquine); 
Kratie (n = 15, 2011 WHO therapeutic efficacy study, dihydroartemisinin- 
piperaquine); Preah Vihear (n = 19, 2011 WHO therapeutic efficacy study, 
dihydroartemisinin-piperaquine); Ratanakiri (n = 32, 2010 WHO therapeutic 
efficacy study, dihydroartemisinin-piperaquine). Spearman’s coefficient of 
rank correlation (8 sites): r = —0.99, 95% confidence interval —0.99 to —0.96, 
P<0.0001. 
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Front Oncol. 2012;2:200. doi: 10.3389/fonc.2012.00200. Epub 2012 Dec 26. 
The Keap1-Nrf2 system in cancers: stress response and anabolic metabolism. 
Mitsuishi Y, Motohashi H, Yamamoto M 


Extended Data Figure 4 | Schematic representation of homology between 
P. falciparum K13 and human KEAP!1 proteins and structural 3D model of 
the K13-propeller domain. a, Schematic representation of the predicted 
PF3D7_1343700 protein and homology to human KEAP1. Similar to KEAP1, 
PF3D7_1343700 contains a BTB/POZ domain and a C-terminal 6-blade 
propeller, which assembles kelch motifs consisting of four anti-parallel beta 
sheets. b, Structural 3D model of the K13-propeller domain showing the six 
kelch blades numbered 1 to 6 from N to C terminus and colour-coded as in 
Supplementary Fig. 1. The level of amino-acid identity between the K13- 
propeller and kelch domains of proteins with solved 3D structures, including 
human KEAP1**’, enabled us to model the 3D structure of the K13-propeller 
and to map the mutations selected under ART pressure (Extended data 
Table 5). The accuracy of the K13-propeller 3D model was confirmed by 
Modeller-specific model/fold criteria of reliability (see Methods). We predict 
that the K13-propeller folds into a 6-bladed B-propeller structure** closed by 
the interaction between a C-terminal beta-sheet and the N-terminal blade***. 
The first domain has three B-sheets, the fourth one being contributed by an 
extra C-terminal B-sheet called $’1 in Supplementary Fig. 1. The human 
KEAP1 kelch propeller scaffold is destabilized by a variety of mutations 
affecting intra- or inter-blade interactions in human lung cancer** and 
hypertension’. The positions of the various mutations are indicated by a 
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sphere, colour-coded as in Figs 2-4. The M476 residue mutated in F32-ART5 
is indicated in dark grey. Like the mutations observed in human KEAP1**”’, 
many K13-propeller mutations are predicted to alter the structure of the 
propeller or modify surface charges, and as a consequence alter the biological 
function of the protein. Importantly, the two major mutations C580Y (red) and 
R539T (blue) observed in Cambodia are both non-conservative and located 
in organized secondary structures: a f-sheet of blade 4 where it is predicted to 
alter the integrity of this scaffold and at the surface of blade 3, respectively. The 
kelch propeller domain of KEAP1 is involved in protein-protein interactions 
like most kelch containing modules*’. KEAP1 is a negative regulator of the 
inducible Nrf2-dependent cytoprotective response, sequestering Nrf2 in 

the cytoplasm under steady state. Upon oxidative stress, the Nrf2/KEAP1 
complex is disrupted, and Nrf2 translocates to the nucleus, where it induces 
transcription of cytoprotective ARE-dependent genes*””*. We speculate that 
similar functions may be performed by PF3D7_1343700 in P. falciparum, such 
that mutations of the K13-propeller impair its interactions with an unknown 
protein partner, resulting in a deregulated anti-oxidant/cytoprotective 
response. The P. falciparum anti-oxidant response is maximal during the late 
trophozoite stage, when haemoglobin digestion and metabolism are highest*!. 
Its regulation is still poorly understood and no Nrf2 orthologue could be 
identified in the Plasmodium genome. 
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Extended Data Table 1 | Sequence of the primers used to amplify the genes containing nonsynonymous single-nucleotide polymorphisms in 
F32-ART5 


Targeted gene Primer forward sequence Primer reverse sequence 

PF3D7_0110400 5'-ttgagcttctttttcccaataatggce-3' 5'-tgatatatgtttgtaggagctgtgag-3' 
PF3D7_0213400 5'-gtgaaaaggataataaattctatgac-3' 5'-tatctaccatatattctgattctcc-3' 
PF3D7_1115700 5'-agcaagaacgttttgtgtaaa-3' 5'-gaattctttaatggttttgaagat-3' 
PF3D7_1302100 5'-taatatgtaaagtgattatgtatatcgc-3' 5'-atgctagagaagttaaagagaagaagcg-3' 
PF3D7_1343700 5 '-agaagagccatcatatccccc-3' 5 '-agtggaagacatcatgtaaccag-3' 
PF3D7_1459600 5'-atatgagtaaaatgtcaggttttgg-3' 5'-tgettgttgtgattcatgggg-3' 
PF3D7_1464500 5'-aaatagttgggcgtagctcag-3' 5'-tatcacaattaagtgtatcacaacg-3' 
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Extended Data Table 2 | Description of the eight nonsynonymous, single-nucleotide polymorphisms acquired in the F32-ART5 compared 
to the F32-TEM lineage during an effective 5-year discontinuous exposure to increasing concentrations of artemisinin 


Codon F32-ART5 lineage 


Chromosome/ pucieoude D le # 
Gene ID : bets position in rug pressure cycie Mutant 
Annotation position TEM 
(eeeod’ le cong ee ee ee oa 
mutated sequence | 2 | 8 
0.2 uM 1.8 uM 9 uM 9 uM 
ART* ART* ART* ART* 


DNA-directed 
RNA 
PF3D7_0110400 polymerase 2 01/39452 173 gAt gAt gAt gTg gTg gig D56V 
complex subunit 
RPB9, putative 


kelch protein, 
PF3D7_1343700# putative, called 13/1725570 1428 atG atG atG M476 
here ‘K13’ 


protein kinase 7 02/542625 TaG 
(PK7) 02/542627 


cysteine 
PF3D7_1115700 proteinase 11/593378 206 tCa tCa tCa tCa tCa S69stop 
falcipain 2a 


PF3D7_0213400 


PF3D7_1302100 prtetapla Femme | oo | Cca Cca Cca Cca Cca poe | P201T 
conserved 
Plasmodium 

PF3D7_1459600* protein, 14/2442240 896 aGt aGt aGt aGt aGt $299T 
unknown 
function 


conserved 
Plasmodium 


PF3D7_1464500 pane 14/2612177 4886 aAt aAt aAt aAt aAt aGt N1629S 
unknown 


function 


# 3D7-type sequence; the same codon sequence is also observed in the parental F32-Tanzania line. 
* Artemisinin (ART) dose used for selection during the corresponding drug-pressure cycle. 
*Genes found in the chromosomal location of top-ranked signatures of selection in ref. 16. 
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Extended Data Table 3 | Reported characteristics of the genes mutated in F32-ART5 parasites 


PF3D7_0110400 (PFA0505c), is a two-exon gene, codes for the RNA Polymerase II subunit 9 (RPB9), a small integral Pol II subunit, which is 
highly conserved among eukaryotes. The yeast RPB9 ortholog has been shown to have a role in assuring the fidelity of transcription in vivo. 
Deletion of the gene results in error-prone transcription”. The protein has a predicted zinc ribbon domain similar to the zinc ribbon domain of 
TFIIS (RNA Polymerase II elongation factor) that contains the essential catalytic Asp-Glu dipeptide’®. Very little is Known on the protein in 
Plasmodia, although the gene is expressed and the protein is present in blood stages (www. plasmodb.org). It is difficult to make any prediction on 
the possible phenotypic consequences of the D56V mutation, which is located in a Plasmodium-specific, well-conserved domain. 


PF3D7_1343700 (PF13_0238), is a one-exon gene (called here K13) that codes for a putative kelch protein. K13 has a predicted 3-domain 
structure, with an approx. 225 residue long, Plasmodium-specific and | well conserved N-terminal domain, followed by a BTB/ POZ domain and a 6- 
blade C-terminal propeller domain formed of canonical kelch motifs**“®. Little is known about the protein in malaria parasites. Proteomics data 
indicate that it is produced by asexual (trophozoites, schizonts, merozoites and rings) and sexual blood stages (gametocytes) of P. falciparum, and 
that it possesses phosphorylated residues in the N-terminal Plasmodium-specific domain (www. plasmodb.org). The M476I mutation is located 
between the first and second blade of the propeller domain. 


PF3D7_0213400 (PFB0605w), is a four-exon gene that codes for protein kinase 7 (PK7) expressed during the asexual blood stage development, 
in gametocytes and ookinetes. The E104 stop mutation (two SNPs affecting the same codon) observed in F32-ART5 interrupts the gene resulting 
in a truncated putative translation product lacking more than 2/3 of its sequence. Studies with genetically inactivated parasites have shown that 
PK7-KO P. falciparum parasites have an asexual growth defect due to a reduced number of merozoites per schizont™. Furthermore, PK7 is 
important for mosquito transmission, with a collapsed number of ookinetes in P. falciparum™ and in P. berghei, where no sporoblasts and 
consequently no sporozoites are formed”. This transmission defective phenotype is unlikely to survive in the field. 


PF3D7_1115700 (PF11_0165), is a one-exon gene that codes for falcipain 2a, a cysteine proteinase produced by maturing blood stages 
(trophozoites and schizonts) and involved in hemoglobin degradation. The S69stop mutation located in the pro-enzyme region precludes 
expression of an active enzyme by F32-ART5 parasites. Gene inactivation has shown to induce a transient reduction, of hemoglobin degradation 
compensated by expression of other members of the cysteine proteinases family, with minimal impact on growth rate°”>. However, falcipain 2a is 
the only gene from the list of seven affected loci that has been associated with the in vitro response to artemisinin. Indeed, it has been 
convincingly shown that inhibition of falcipain2a-dependent hemoglobin digestion by specific inhibitors or by gene inactivation reduced parasite 
susceptibility to artemisinins'®. Moreover, ring stages that do not massively digest hemoglobin display a reduced susceptibility to artemisinins®. 


PF3D7_1302100 (PF13_0011), is a one-exon gene that codes for the gamete antigen 27/25 (Pfg27) produced at the onset of gametocytogenesis. 
The gene is specific to P. falciparum and its close relatives such as P. reichnowi. This is an abundant, dimeric phosphorylated cytoplasmic protein 
that binds RNA. The various KO lines generated display conflicting phenotypes some being deficient in gametocyogenesis™, while other Pfg27- 
defective lines undergo unimpaired gametocytogenesis, uP to stage V, mature gametocytes although absence of Pfg27 is associated with 
abnormalities in intracellular architecture of gametocytes®’. The crystal structure shows that the protein forms a dimer, displays a particular RNA 
binding fold and possesses two Pro-X-X-Pro motifs (known ligands for various domains, including SH3 modules), which combine to form a 
receptacle for SH3 modules. The P201T mutation is located in the C-terminal ProX-X-Pro motif and predicted to alter the spatial structure of the 
interaction domain and thus have functional consequences. 


PF3D7_1459600 (PF14_0569), is a two-exon gene that codes for a 806 residue-long, conserved protein of unknown function. The P. yoelii 
ortholog has been annotated as the CAAT-box DNA binding subunit B. Close orthologs can be found only among the Plasmodium species. 
Proteomics data indicate that the protein is present in asexual (trophozoites, schizonts, merozoites and rings) and sexual (gametocytes) blood 
stages of P. falciparum. A predicted approx. 130 aa-long Interpro domain suggests presence of an N-terminal multi-helical, alpha-alpha 2-layered 
structural VHS fold, possibly involved in intracellular membrane trafficking. The rest of the coding sequence carries no specific domain signature. 
The S299T mutation is located within this "unknown" region. 


PF3D7_1464500 (PF14_0603), is a five-exon gene that codes for a 3251 residue-long protein of unknown function, with 4 predicted trans- 
membrane domains, but otherwise no specific domain signature. Apart from proteomics data indicating its expression and phosphorylation in 
schizonts, with possible expression in gametocytes and sporozoites as well, little is known about its putative function. The N1629S mutation is 
located in the middle of the protein, with unpredictable phenotypic impact. 
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Extended Data Table 4 | Geographic origin and year of collection of archived blood samples studied for K13-propeller polymorphism 


Year of collection 


Region Province Total 
2001-2002 2003-2004 2005-2006 2007-2008 2009-2010 2011-2012 
Battambang 64 0 0 0 0 71 135 
on Pailin 40 43 46 95 66 84 374 
Cambodia 
Pursat 0 10 0 0 43 19 72 
Southem Kampot 0 0 0 0 0 12 12 
Cambodia = KampongSom 0. 0 0 0 0 7 7 
Northem Oddar Meanchey 0 0 0 0 0 33 33 
Cambodia Preah Vihear 27 27 25 24 0 19 122 
Kratie 15 0 0 0 0 17 32 
Eastem _ 
Cambodia Mondulkiri 0 0 0 0 0 3 3 
Ratanakin 56 30 22 0 8 35 151 
Total 202 110 93 119 117 300 941 
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Extended Data Table 5 | Polymorphisms observed in the K13-propeller in Cambodian P. falciparum isolates collected in 2001-2012 and in 
The Gambia (ref. 42) 


Codon Amino Acid Nucleotide AminoAcid Nucleotide 
Position reference reference mutation mutation 
449 G get A gCt 
458 N aat Y Tat 
474 T aca | aTa 
476* M atg | atA 
481 A gct V gTt 
493 Y tac H Cac 
508 T act N aAt 
527 P cct T Act 
533 G get S Agt 
537 N aat | aTt 
539 R aga T aCa 
543 | att T aCt 
553 P ccg L cTg 
561 R cgt H cAt 
568 V gtg G gGg 
574 P cet L cTt 
580 c tgt y tAt 
584 D gat V git 
612** E gaa D gaT 
623 S agt C Tgt 


* Observed in F32-ART5, not observed in Cambodia 
** Reported in The Gambia“, not observed in Cambodia 
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Extended Data Table 6 | Association between polymorphisms observed in the K13-propeller and KH subpopulations (ref. 15) in 150 P. 


falciparum isolates collected in 2009-2010 in Pursat (n = 103) and Ratanakiri (n = 47) provinces, Cambodia 


Mutations in the K13 - propeller 


KH group Province a Total 
Wildtype C580Y R539T Y493H 
Pursat 7 0 0 2 9 
sasha Ratanakiri 46 0 0 0 46 
Pursat 0 25 0 1 26 
ne Ratanakiri ) 0 0 0 0 
Pursat 3 7 4 0 14 
— Ratanakiri ) 0 0 0 ) 
Pursat 0 0 0 12 12 
uses Ratanakiri 0 0 0 0 ) 
Pursat 15 19 2 6 42 
— Ratanakiri 1 0 ) 0 1 
Total 72 51 6 21 150 
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Structural basis for Ca** selectivity of a 
voltage-gated calctum channel 


Lin Tang’*, Tamer M. Gamal El-Din", Jian Payandeh't, Gilbert Q. Martinez', Teresa M. Heard', Todd Scheuer’, Ning Zheng'? 


& William A. Catterall! 


Voltage-gated calcium (Cay) channels catalyse rapid, highly selective influx of Ca?* into cells despite a 70-fold higher 
extracellular concentration of Na*. How Cay channels solve this fundamental biophysical problem remains unclear. Here 
we report physiological and crystallographic analyses of a calcium selectivity filter constructed in the homotetrameric 
bacterial Nay channel NayAb. Our results reveal interactions of hydrated Ca** with two high-affinity Ca** -binding sites 
followed by a third lower-affinity site that would coordinate Ca’* as it moves inward. At the selectivity filter entry, Site 1 
is formed by four carboxy] side chains, which have a critical role in determining Ca’* selectivity. Four carboxyls plus four 
backbone carbonyls form Site 2, which is targeted by the blocking cations Cd?* and Mn?*, with single occupancy. The 
lower-affinity Site 3 is formed by four backbone carbonyls alone, which mediate exit into the central cavity. This pore 
architecture suggests a conduction pathway involving transitions between two main states with one or two hydrated 
Ca’* ions bound in the selectivity filter and supports a ‘knock-off’ mechanism of ion permeation through a stepwise- 
binding process. The multi-ion selectivity filter of our CayAb model establishes a structural framework for understan- 


ding the mechanisms of ion selectivity and conductance by vertebrate Cay channels. 


Ca’* ions flow through Cay channels ata rate of ~10° ions s_', yet Na* 
conductance is >500-fold lower’. Such high-fidelity, high-throughput 
Cay channel performance is important in regulating intracellular pro- 
cesses such as contraction, secretion, neurotransmission and gene expres- 
sion in many different cell types”. Because the extracellular concentration 
of Na” is 70-fold higher than Ca’ *, these essential biological functions 
require Cay channels to be highly selective for Ca”* in preference to 
Na‘, even though Ca** and Na~ have nearly identical diameters (~2 A). 
Ion selectivity of Cay channels is proposed to result from high-affinity 
binding of Ca**, which prevents Na* permeation. Fast Ca” flux through 
Cay channels is thought to use a ‘knock-off mechanism in which elec- 
trostatic repulsion between Ca** ions within the selectivity filter over- 
comes tight binding ofa single Ca** ion'**. Most of these mechanisms 
require a multi-ion pore, yet extensive mutational analyses of ion selec- 
tivity and cation block of vertebrate Cay channels support a single high- 
affinity Ca” *-binding site’, 

Cay channels contain a single ion-selective pore in the centre of four 
homologous domains’. The central pore is lined by the transmembrane 
segments (S) S5 and S6 and the intervening ‘Pore (P)-loop’ from each 
domain in a four-fold pseudosymmetrical arrangement. The four voltage- 
sensing modules composed of $1-S4 transmembrane helices are sym- 
metrically arranged around the central pore. Cay channels are members 
of the voltage-gated ion channel superfamily and are closely related to 
voltage-gated Na* (Nay) channels. Three structures of homotetra- 
meric bacterial Nay channels open the way to elucidating the struc- 
tural basis for ion selectivity and conductance of vertebrate Nay and 
Cay channels'*’’, which probably evolved from the bacterial NaChBac 
family and retained similar structures and functions (Supplementary 
Fig. 1)'**°. Interestingly, mutation of three amino-acid residues in the 
selectivity filter of NaChBac is sufficient to confer Ca”* selectivity”. 
We introduced analogous mutations into the bacterial Nay channel 
NayAb to create Cay Ab and carried out electrophysiological and X-ray 


crystallographic analyses to determine the relative permeability of Ca~* 
and define ion-binding sites in the selectivity filter. Our systematic ana- 
lyses of CayAb and intermediate derivatives provide structural and 
mechanistic insights into Ca** binding and ion permeation and sug- 
gest a conductance mechanism involving two energetically similar ion- 
occupancy states with one or two hydrated Ca** ions bound. 


Structure and function of CayAb 


NayAb channels have four identical pore motifs (‘?>TLESWSM?®?) 
that form the ion selectivity filter’*. The side chains of E177 form a 
high-field-strength site (Siteyzps) at the outer end of the filter, whereas 
two additional potential Na‘ -coordination sites, a central site (SitecEn) 
and an inner site (Siteyy), are formed by the backbone carbonyls 
of L176 and T175 (ref. 15). To create CayAb, E177, $178 and M181 
were substituted with Asp, resulting in a mutant with the pore motif 
*°TLDDWSD"®? (underlined letters indicate mutated residues). CayAb 
was expressed in Trichopulsia ni cells (High5) and analysed by whole- 
cell voltage clamp to determine its ion selectivity. In contrast to Nay Ab, 
which does not conduct extracellular Ca”* ions but carries outward 
Na* current (Fig. 1a, b), CayAb conducts inward Ca?* current ina 
voltage-dependent manner (Fig. 1c, d). Complete titration curves for 
Ca** in the presence of Ba** as the balancing divalent cation (see 
Methods) revealed inhibition of Ba** current by low concentrations 
of Ca** followed by increases in Ca** current at higher Ca** concentra- 
tions (Fig. le). These results demonstrate the anomalous mole fraction 
effect characteristic of vertebrate Cay channels. Comparable experi- 
ments with Na’ as the balancing cation were not possible because of 
the instability of the High5 cells in solutions with low divalent cation 
concentrations. The reversal potential for Ca** current under bi-ionic 
conditions closely follows the expectation for a highly Ca * -selective 
conductance (30.6 + 2.3mV decade ', Fig. 1f and Supplementary 
Fig. 2), and Cay Ab selects Ca?* 382-fold over Na‘ under our standard 
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Figure 1 | Structure and function of the CayAb channel. a, b, Outward Na* 
current conducted by NayAb with 10 mM extracellular Ca”* and 140 mM 
intracellular Na*. Holding potential, —100 mV; 20-ms, 10-mV step 
depolarizations. c, d, Voltage-dependent conductance of inward Ca?* current 
by CayAb under the same conditions. 20-ms, 5-mV step depolarizations. 

e, Biphasic anomalous mole fraction effect of increasing Ca’* as indicated, with 
Ba?” as the balancing divalent cation: 10 mM Ba** with 0 to 0.5mM Ca’, 
9.3mM Ba’~ with 0.7 mM Ca’*, and 0mM Ba** with 10 mM Ca’* 


recording conditions, yielding a range of the permeability (P) ratio 
Pca:PNa > 10,000-fold for these constructs (Fig. 1g). Intermediate CayAb 
derivatives with single and double Asp substitutions had progressive 
increases in Ca*” selectivity (Fig. lg and Supplementary Fig. 2), as 
observed for NaChBac”. The '”’TLDDWSN™ mutant has an Asn resi- 
due in place of the final Asp, as observed in one domain of mammalian 
Cay channels (Supplementary Fig. 1), and it still favours Ca?* over Na* 
by more than 100-fold (Fig. 1g). 

We crystallized and determined the structure of CayAb and its deriva- 
tives by molecular replacement using the NayAb structure (PDB code 
3RVY) as the search template (Supplementary Table 1). The overall 
structure of CayAb is very similar to that of NayAb, with a root mean 
squared deviation (r.m.s.d.) of 04A (Fig. 1h). However, the electro- 
static potential at the outer entry to the selectivity filter is more negative 
for CayAb than for NayAb (Supplementary Fig. 3). The three nega- 
tively charged Asp residues introduced at the selectivity filter of CayAb 
create a wide, short, electronegatively lined pore (6 A diameter, 10 A 
length) with no significant alteration in backbone structure with respect 
to NayAb (Fig. li, j and Supplementary Fig. 4). Thus, the Ca’ * selecti- 
vity of CayAb is mainly determined by the side chains of the amino 
acids at the selectivity filter. 


Ca’* -binding sites in the permeation pathway 

The 3.2 A resolution structure of the mutant “°TLDDWSN'*’ in the 
presence of 10 mM Ca*" reveals electron densities in the selectivity filter 
consistent with three Ca** ions aligned on the central axis (Fig. 2a). In 
the outer vestibule leading to the selectivity filter, there are two addi- 
tional less-intense on-axis peaks associated with weaker surrounding 
densities. To confirm the identity of the bound ions, we collected X-ray 
diffraction data at a wavelength of 1.75 Aand calculated the F* ca-F ca 
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(n = 4-10). f, Reversal potential (E,.y) versus Ca’* concentration. g, Relative 
permeability of CayAb and its derivatives as measured from bi-ionic reversal 
potentials. Pca/Pna; blue; Ppa/PNa, green (n = 5-22). h, Cartoon representation 
of the overall structure of CayAb (yellow) superimposed with NayAb (slate). 
i, j, Top (i) and side (j) views of the superimposed selectivity filters of 
CayAb (yellow) and NayAb (slate) in stick representation. The three original 
NavAb residues (black) and substituted CavAb residues (orange) are indicated. 
Errors bars in b and d-g are + s.e.m. 


anomalous difference map. Two strong peaks followed by a weaker 
peak on the intracellular side were found in the selectivity filter along 
the ion-conduction pathway, verifying three binding sites for Ca”" 
(Fig. 2b). We name these Site 1, Site 2 and Site 3 from the extracellular 
to the intracellular side. 

The Ca** ion at Site 1 is predominantly coordinated by the carb- 
oxyl groups of D178 (Sitegx in NayAb), which define a plane at the 
selectivity filter entrance on the extracellular side of the bound Ca** 
ion (Fig. 2b). The distance between the carboxyl oxygen and Ca” is 
about 4.0 A. This distance suggests that the ion binds at this site in a 
hydrated form because the ionic diameter of Ca** is 2.28 A, too small 
to interact with the carboxylate anions directly but appropriate for 
interaction through bound water molecules. Further into the pore, the 
four acidic side chains of D177 (Siteyps in NayAb) are located along 
the wall of the selectivity filter rather than projecting into the lumen, 
thereby also allowing the binding ofa fully hydrated Ca’ * ion (Fig. 2b). 
Different from Site 1, this central Ca’**-binding site (Site 2) is sur- 
rounded by a box of four carboxylate oxygen atoms from D177 above 
and four backbone carbonyl oxygen atoms from L176 below (Sitecen 
in NayAb), with oxygen-Ca’* distances of 4.5 A and 4.2 A, respec- 
tively (Fig. 2b). At the intracellular side of the pore, the third Ca?*- 
binding site (Site 3) is composed of one plane of four carbonyls from 
T175 (Site; in NayAb), which point inward to the lumen (Fig. 2b). 
Here the Ca’ ion lies nearly on the same plane as T175 carbonyls. The 
chemical environment of Site 3 hints at a lower affinity, consistent with 
its role in exit of Ca’* from the selectivity filter into the central cavity. 
Throughout the selectivity filter, the oxygen—Ca~* coordination dis- 
tances are in the range of 4.0-5.0 A, suggesting that the bound Ca** ion 
is continuously stabilized in a fully hydrated state when it passes through 
the pore. We observed diffuse electron density and in favourable cases 
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* binding sites in and near the selectivity filter of NayAb, 
Cay Ab and their derivatives. a, Electron density at the selectivity filter of 
’°TLDDWSN'*" (also see Supplementary Fig. 4). The 2F, - F, electron density 
map (contoured at 2c) of select residues in the selectivity filter with two 


Figure 2 | Ca” 


diagonally opposed subunits shown in sticks, the Ca”~ ions along the ion 
pathway in green spheres and water molecules in red spheres. b, Densities at 
Ca** binding sites 1 and 2 from the anomalous difference Fourier map (30) 
calculated from the diffraction data of a °TLDDWSN"*? mutant crystal 
soaked in the presence of 5 mM Ca** and collected at 1.75 A wavelength. The 
distances between Ca?" and oxygen atoms (dashed lines) are about 4.0 A at Site 

1 (blue lines), 4.4 A at Site 2 (blue and magenta lines) and 5.0 A (magenta line) 
at Site 3. For clarity, the subunit closest to the viewer is not shown. c, d, A 
comparison between '”°TLEDWSM'*! and V°TLESWSM"™" (NayAb) 
highlighting the importance of Site 1 for Ca“ selectivity. e, f, A comparison 
between ‘”°TLDDWSD"*? |(CayAb) and *°TLEDWSD' 1 highlighting the role 
of Site 2 in fine tu tuning | g Ca" selectivity. All structures were determined in the 
presence of 15mM Ca?* 


discrete water molecules surrounding the bound Ca’ ", consistent with 
the presence ofan inner shell of bound waters of hydration (Supplemen- 
tary Fig. 5). 

Although the anomalous difference map did not resolve clear peaks 
at the outer vestibule beyond the selectivity filter, we interpret the two 
on-axis 2F, - F. densities above the three Ca”* sites as two additional 
Ca** ions poised to enter the pore (Fig. 2a). This assignment is sup- 
ported by the surrounding eight islets of density, which probably 
represent eight stabilized water molecules. Just as at Site 2 in the selec- 
tivity filter, these eight water molecules appear to serve as a square anti- 
prism cage coordinating a hydrated Ca”" ion at the centre (Fig. 2a). 
The second Ca’* ion located at the bottom of this cage is ~4.5 A away 
from the four carboxyl oxygen atoms of D178, suggesting that part of 
its second hydration shell is replaced by D178 before the ion enters the 
selectivity filter. The selectivity filter, therefore, appears to select Ca”* 
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at its mouth by recognizing the Ca”*-H,O hydration complex and 
conduct Ca" by fitting the Ca”*-H,O hydration complex into the 
pore. Because Ca** is more electropositive than Na‘, it should bind 
more tightly in the ion selectivity filter of CayAb, providing a mecha- 
nistic basis for the block of Na* permeation by Ca’* at low Ca?* 
concentration and preferential permeation of Ca** at higher Ca** 


concentration (see Discussion). 


Functional roles of key selectivity filter residues 
Measurements of bi-ionic reversal potentials revealed that the relative 
permeability of different CayAb intermediate constructs for Ca”* 
follows the order of CavAb (‘”°TLDDWSD"*) > “°TLDDWSN'*! 

> 'TLEDWSD"®*! > '”°TLEDWSM"*! > Nay Ab ('7°TLESWSM"*?) 

> V°TLDSWSM' *! (Fig. 1g and Supplementary Fig, 2). A comparison of 
the Ca’ selectivity ratios between ““TLEDWSM'* and '”°TLESWSM'*" 
(NayAb) shows that substitution of S178 with Asp is sufficient to con- 
vert the selectivity from Na‘ to Ca** with >100-fold change in Poa:PNa 
(Fig. 1g). Placement of the Asp carboxyl side chain at this position allows 
for the formation of the first hydrated Ca** -binding site in the selec- 
tivity filter (Fig. 2c and Supplementary Fig. 6). By contrast, $178 in 
NayAb binds Ca** directly by displacing its hydration shell, which 
blocks conductance of both Na* and Ca?‘ (Fig. 2d). Therefore, for- 
mation of Site 1 for binding hydrated Ca’* is both necessary and suffi- 
cient for cotereay Ca** selectivity over Na* to NayAb. 

The Ca?* selectivity ratio of CayAb (‘”°TLDDWSD"*) i is 5.5-fold 
higher than '”°TLEDWSD"*? (Fig. 1g). This functional difference reflects 
a role of Site 2 in adjusting Ca** selectivity. Different from the side 
chains of D177 in CayAb (*°TLDDWSD"*"), which interact with the 
Ca’* ion (Fig. 2e), the carboxyl group of E177 in '’TLEDWSD!*! 
swings away from the selectivity filter and forms a hydrogen bond with 
D181 and the main-chain nitrogen atoms of $180 (Fig. 2f and Sup- 
plementary Fig. 7). Site 2 in '’ TLEDWSD"*", therefore, is exclusively 
formed by the four carbonyl oxygen atoms of L176, which conceivably 
leads to a lower Ca" -binding affinity and a decreased Ca”” selec- 
tivity. This comparison highlights both the importance of Site 2 in 
supporting high Ca’* selectivity and the critical role of the backbone 
carbonyl groups of L176 in constructing this ion-binding site. 

Distinct from D177 and D178, the N181 residue of °TLDDWSN"™? 
lies outside of the ion-conducting pore and is not directly involved in 
Ca’* ion coordination. In close proximity to the carboxyl groups of 
D178, which form a ring that lines the perimeter of the pore entryway, 
the side chain of N181 embraces the perimeter of the D178 ring by 
donating a hydrogen bond to its side-chain carboxyls (Fig. 3a). Such 
a structural arrangement is also found in CayAb (*”°TLDDWSD'*") 
(Fig. 3b), although the more electronegative environment created by 
the extra negatively charged residue, D181, probably attracts Ca** 
more strongly and confers a 4- to 5-fold higher degree of Ca** selec- 
tivity to CayAb ('” TLDDWSD"") in comparison to '”’TLDDWSN"*? 
(Fig. 1g and Supplementary Fig. 3). 

”°TLDDWSM"*", which has the hydrophobic residue M181 packed 
next to the D178 ring, is the only CayAb intermediate that does not 
conduct Ca”* (Supplementary Fig. 2). The crystal structure of this mutant 
reveals a blocking Ca’ * ion tightly bound at Site 1 ina dehydrated state 
with an oxygen-ion distance of 2.3 A (Fig. 3c). Superposition analy- 
sis shows few structural differences between '”TLDDWSM'*? and 
TLDDWSN™, except for the side chain of D178, 8, which is fixed 
by NI81 in "STLDDWSN but unconstrained in '”TLDDWSM"*! 
(Fig. 3a, c). This comparison indicates that N181 in ‘STLDDWSN'*! 
and D181 in CayAb have critical roles i in engaging D178 and allowing 
the reversible binding of the Ca”*-H,0O hydration complex for active 
Ca** conductance. Although the subtle difference in Ca*™ selectivity 
between ”’ TLEDWSD"' and '” TLEDWSM"*"' seems to argue against 
this conclusion (Fig. 1g), E177 in '”*TLEDWSM"*" actually has a struc- 
tural role equivalent to that of N181 in ‘TL DDWSN = by pointing 
away from the selectivity filter lumen, E177 formsa carboxylate-carboxylate 
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Figure 3 | Ion binding and block of CayAb and its derivatives. a, b, Top view 
of Site 1 with a hydrated Ca” ion coordinated by D178 with the help of N181 
and D181 in 'TLDDWSN™! and !”°TLDDWSD"*! (Cay Ab), respectively. 
c, Binding of a dehydrated Ca?* ion at Site 1 in the nonconductive 
"°TLDDWSM'*! mutant. d, Coordination of a hydrated Ca”* ion at Site 1 of 


the ‘757, EDWSM!*! mutant. Despite the absence of a polar residue at amino 
acid 181, E177 in STLEDWSM"** i is able to hold D178 in place to allow the 


binding of a hydrated Ca”” ion. e, f, Block of Ca** conductance by the 


pair with D178 and holds it in a conduction-competent position (Fig. 3d 
and Supplementary Fig. 8). 


Block of NayAb and CayAb channels by divalent cations 
Cd?*, Mn?* and other inorganic cations are effective blockers of Cay 
channels’. Block of Ca** conductance of CayAb by Cd** and Mn** 
gives K; values of 1.78 .M for Cd?* and 526 uM for Mn?* (Fig. 3e, f, 
blue). Cd** has a lower affinity and Mn?“ hasa higher affinity for block 

of '”°TLDDWSN""! (Fig. 3e, f, red). Crystals with bound Cd** and 
Mn?** were obtained by soaking CayAb crystals in a cryo- soliton 
containing these heavy metal ions, and the anomalous difference map 
was calculated from a data set collected at 1.75 A wavelength. The struc- 
tures show that both Cd’* and Mn’* bind in the selectivity filter at the 
central site (Site 2), which is coordinated by the side chains of the four 
D177 residues and the carbonyl groups of L176 (Fig. 3g, h). Locked at 
this site, these blocking ions would inhibit the Ca** current by compe- 
titively binding to the high-affinity site required for Ca** permeation. 
Another important common feature of the two blocking complexes of 
CayAb is the block of permeation by binding ofa single divalent cation 
within the selectivity filter, which supports the hypothesis that at least 
two divalent-cation-binding sites must be located close enough to induce 
repulsive interactions and allow divalent cation conductance bya knock- 
off mechanism. Because they are smaller than Ca?*, the bound Cd** 
(d= 2.18 A) and Mn?" (d= 1.94 A) must interact with the selectivity 
filter through bound waters of hydration, and electron density consis- 
tent with bound waters of hydration is observed in our structures (Sup- 
plementary Fig. 5). 


Ion binding at the Ca?* SER ciny filter 

To assess the properties of the three Ca”* -binding sites in the selec- 
tivity filter of ’° TLDDWSN"*’, we titrated the concentration of Ca** 
in the cryo- solution and calculated the anomalous difference maps. At 
low Ca”™ concentration, two strong peaks of approximately equal inten- 
sity are found at Site 1 and Site 2 (Supplementary Fig. 9). As the Ca** 
concentration is raised, the electron density of Site 2 is substantially 
enhanced, but the peak intensity is reduced at Site 1 and remains low at 
Site 3 (Supplementary Fig. 9). These results suggest that the central site 
has the highest affinity, whereas Site 3 is the weakest. It is probable that 
this titration pattern reflects independent binding of Ca’ to Sites 1, 2 
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indicated concentrations of Cd?* and Mn**. 'TLDDWSD"*!: ICs, o(Cd°"), 
1.7 + 0.04 1M; ICso(Mn‘*), 526 + 22 uM. ‘75T_DDWSN?*!, ICs9(Cd?*), 

5.9 + 0.4uUM; ICso(Mn° *), 38847 pM. Error bars are + s.e.m. g, h, Side view 
of the Cd**- and Mn’* -binding sites in the selectivity filter of CayAb. The 
anomalous difference Fourier map densities (blue mesh, contoured at 5a) of the 
bound blocking ions are calculated using diffraction data collected at 1.75 A 
wavelength. For clarity, the residues forming the selectivity filter from the two 
subunits in front of and behind the plane of the drawing were removed. 


and 3 located in different individual molecules of CayAb at low Ca”* 
concentration, whereas increasing concentrations of Ca** saturate Site 
2 in most or all individual CayAb molecules and reduce or eliminate 
binding at Sites 1 and 3 by repulsion. Importantly, the two flanking sites 
have lower affinity than the central site, as proposed in the ‘stepwise 
binding model’ of Cay channel permeation’. In this model, the pre- 
sence of flanking sites of intermediate affinity facilitates the movement 
of Ca’* into and out of a central high-affinity site, which can result in 
high ion conductance, even in the limiting case where there is no repul- 
sion between bound ions. 

Consistent with high binding affinity, Ca”* binds at Site 2 with its 
first hydration shell waters coordinated with eight oxygen atoms from 
the channel (Fig. 2b and Supplementary Fig. 5). By contrast, Ca” at 
site 1 is mainly stabilized by one plane of four carboxyl groups from 
D178. The distance between the Ca“ ion at Site 1 and the carboxyl group 
of D177 at Site 2 is about 5.5-6 A. As the Ca”* ion moves inward, this 
distance will be reduced enough for D177 to forma stable coordination 
with the moving Ca** ion. This spatial configuration suggests that the 
two sites are separated by a low energy barrier. The differences of nega- 
tive charge between D178 and the carbonyls of T175 and the differences 
in the geometry of their interactions with Ca** provide a plausible 
explanation for the higher Ca~* -binding affinity at Site 1 than Site 3. 


Ion permeation mechanism 


The three Ca”* -binding sites in the selectivity filter of “°TLDDWSN"™* 
are separated by a distance of about 4.5 A, which would result in sub- 
stantial electrostatic repulsive interactions between bound ions. As in 
the case of the KcsA channel”, it is energetically unfavourable for Ca** 
ions to occupy adjacent sites simultaneously. This leads directly to our 
hypothesis of two interchangeable functional states of the selec- 
tivity filter in the crystal structure (Fig. 4a, b). In State 1, Ca** ions occupy 
Site 1 and Site 3. In State 2, a single Ca” ion occupies Site 2. These two 
states might be further coupled with one of the two Ca’* ions at the outer 
vestibule ready to enter the pore (Fig. 4c). The transition between these 
two states occurs either when Ca** jumps from Site 1 or 3 to Site 2 ora 
third ion enters on one side of the filter, causing an ion to move into 
Site 2. It is probable that our crystal structures reflect a mixed popula- 
tion of CayAb molecules in which only Site 2 is occupied by Ca” plus 
CayAb molecules in which Site 1 and/or Site 3 are occupied. Because of 
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Figure 4 | Catalytic cycle for Ca”* conductance by CayAb. a, An ionic 
occupancy state diagram of CayAb showing two proposed low energy states 
and the potential transitions that connect them. Each state of the selectivity 
filter is represented by a three-box rectangle with Sites 1-3 going from left to 
right. Green circles represent Ca~* ions. Note that transitions in the inner circle 
potentially lead to ion repulsion, which might facilitate conduction. These 
transitions in the inner circle are more probable than those in the outer circle, as 
denoted by the bold arrows. b, The structural basis of the ionic occupancy 
states depicted in the inner circle of the state diagram shown on the left. The 
clockwise cycle represents a path for inward flux of Ca** ions through the 
selectivity filter. c, Coupling of extracellular Ca”* -binding sites and the three 
sites within the selectivity filter in the two proposed ionic occupancy states. 
When two Ca’* ions bind to position 1 and 3 in the filter, the entryway Ca”* 
ion is placed furthest from the pore (left). When one Ca** ion binds to position 
2 within the filter, the ion outside the filter is pulled closer to the pore (right). 


the high concentration of Ca?* in the extracellular solution, Ca?* will 
prefer to enter Site 1 and the weak binding of Ca’ to Site 3 will force loss 
of Ca”* into the low Ca** concentration in the cytosol. This generates 
a unidirectional flux of Ca?* into the cell (Fig. 4c). The three-ion- 
occupied state would be manifest only when the external Ca** con- 
centration is increased enough that the flux reaches a limiting value’. 
The presence of the lower-affinity Site 3 flanking the central cavity 
would further accelerate the flux of ions by allowing stepwise binding 
with relatively low chemical potential energy barriers’. The combina- 
tion of ionic repulsion between Ca*~ ions bound at these sites and their 
stepwise change in binding affinity work together to allow rapid con- 
ductance in spite of the intrinsic high affinity for Ca** binding. 


Discussion 


The mechanism underlying the dramatic difference in selectivity for 
Ca”* over Na* in CayAb versus NayAb is different from the mecha- 
nisms responsible for selectivity of KT over Na* and for Ca** block 
revealed by high-resolution structural studies of the NaK channel’? ”. 
In the NaK channel, K* conductance is favoured by the presence of 
four binding sites formed by backbone carbonyls, rather than two or 
three, and structural changes in amino acid residues outside the ion 
selectivity filter fine-tune the electronegativity of backbone carbonyls 
in the selectivity filter and thereby determine the affinity for block by 
Ca’* at the extracellular mouth of the pore. This difference in the 
underlying mechanism for control of ion selectivity reflects the fun- 
damental difference in ion permeation in NayAb and CayAb versus 
K* channels. In NayAb and CayAb, permeant ions interact with both 
amino acid side chains and backbone carbonyls in the ion selectivity 
filter primarily through waters of hydration, whereas K* channels 
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select their permeant ions through direct interaction of the dehydrated 
ions with backbone carbonyls. 

Our results reveal an unexpected structural basis for Ca" select- 
ivity and conductance in CayAb channels, in which most or all inter- 
actions of Ca** with the pore are made through its inner shell of 
waters of hydration. A set of three Ca** binding sites cooperate in a 
knock-off mechanism in which the selectivity filter oscillates prim- 
arily between two states with either one hydrated Ca”* bound at the 
central site or two hydrated Ca** ions bound at the distal sites. The 
high-affinity binding of Ca* to Sites 1 and 2 ensures that Na* and 
other monovalent cations cannot permeate, while the high Ca?* con- 
centration in the extracellular solution enables unidirectional flux by 
driving rapid occupancy of Site 1. The ionic repulsion between Ca” ~ 
ions bound at these sites and their stepwise change in binding affinity 
work together to allow rapid conductance in spite of the intrinsic high 
affinity for Ca** binding. Although our resolution does not allow us 
to see all of the waters of hydration that are implied by our structure, 
we do observe electron density surrounding bound Ca’* ions at Sites 
1, 2 and 3 that we believe represents the inner shell of waters of 
hydration (Supplementary Figure 5). This electron density is blurred, 
as if there is a diversity of arrangements of the bound water molecules 
in individual CayAb molecules in our crystals because their hydro- 
gen-bonding requirements can be accommodated in multiple ways 
between the bound cations and their coordinating carboxyl and car- 
bonyl oxygens that comprise Sites 1, 2 and 3. In our most favourable 
structure (Supplementary Fig. 5g, h), four discrete water molecules are 
observed at Site 3. Altogether, we believe that these images provide 
direct support for the conclusion that bound Ca’~ ions are sur- 
rounded by an inner shell of waters of hydration that are dynamic 
and can easily exchange local hydrogen-binding partners. This is a 
unique ion conduction mechanism, which allows high-affinity inter- 
action of hydrated Ca** ions while mediating their rapid movement 
from the extracellular vestibule, through the three ion coordination 
sites of the selectivity filter, through the central cavity, and finally into 
the cytosol. 

Biophysical modelling of Ca** permeation in vertebrate Cay chan- 

nels has led to multiple proposed mechanisms, most of which involve 
two or more Ca~" -binding sites, yet only a single high-affinity site that 
is required for both permeation and Ca** block was identified by 
mutagenesis and physiological analyses’. Our results with CayAb 
channels resolve this apparent discrepancy by showing that multiple 
Ca’* -binding sites are necessary for permeation, but only Site 2 binds 
divalent cations with sufficient affinity for block. Ca** is conducted asa 
hydrated cation (Supplementary Fig. 5), consistent with the large esti- 
mated functional diameter of vertebrate Cay channels of 6 A (ref. 26). 
Detailed structure-function studies of vertebrate Cay channels show 
that mutations of the four residues equivalent to E177 have distinct 
effects on Ca”* conductance and block, implying that domain-specific 
interactions with Ca?* have evolved in vertebrate four-domain Cay 
channels'°"’””°. Vertebrate Cay channels might share similar mole- 
cular mechanisms for Ca** permeation and selectivity despite their 
pseudosymmetrical four-domain configuration. 
Note added in proof: Crystal structures of isolated pore domains of 
other bacterial Nay channels reveal an open pore conformation for 
NayMs (ref. 30) and a binding site for blocking Ca** ions in NayAe 
(ref. 31), which is formed primarily by the equivalent of Ser 178 in 
NayAb. 


2+ 


METHODS SUMMARY 


CayAb and its derivative constructs were expressed in Trichopulsia ni insect cells 
and purified using anti-Flag resin and size-exclusion chromatography, reconsti- 
tuted into DMPC:CHAPSO bicelles, and crystallized over an ammonium sulphate 
solution containing 0.1 M Na-citrate, pH 4.75. The anomalous data sets were col- 
lected at 1.75 A wavelength with crystals soaked in a stabilizing solution containing 
various concentrations of cation ions. Electrophysiological experiments were per- 
formed in T. ni cells using standard protocols. 
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METHODS 


Protein expression and purification. The pFastBac-Flag-NayAb(I217C) that was 
used as the genetic background for Cay Ab constructs was described previously'*"®. 
Cay Ab and its derivatives—'”°TLDDWSN"™, !”°TLEDWSD"*!, '°TLEDWSM’*! 
and !7>TLDSWSM'*!—_were generated via site-directed mutagenesis using Quick- 
Change (Stratagene). Recombinant baculovirus were produced using the Bac-to- 
Bac system (Invitrogen), and T. ni insect cells were infected for large-scale protein 
purification. Cells were collected 72 h after infection and re-suspended in buffer A 
(50 mM Tris-HCl, pH 8.0, 200 mM NaCl) supplemented with protease inhibitors 
and DNase. After sonication, digitonin (EMD Biosciences) was added to 1% and 
solubilization was carried out for 1-2 hat 4 °C. Clarified supernatant was then incu- 
bated with anti-Flag M2-agarose resin (Sigma) for 1-2 h at 4 °C with gentle mixing. 
Flag-resin was washed with ten column volumes of buffer B (buffer A supplemen- 
ted with 0.12% digitonin) and eluted with buffer B supplemented with 0.1 mg ml 
Flag peptide. The eluent was concentrated and then passed over a Superdex 200 
column (GE Healthcare) in 10 mM Tris-HCl, pH 8.0, 100 mM NaCl and 0.12% 
digitonin. The peak fractions were concentrated using a Vivaspin 30k centrifugal 
device. 
Crystallization and data collection. Cay Ab and its derivatives were concentrated 
to ~20 mg ml ~ ' and reconstituted into DMPC:CHAPSO (Anatrace) bicelles accor- 
ding to standard protocols’. The protein-bicelle preparation and a well solution 
containing 1.8-2.0 Mammonium sulphate, 100 mM Na-citrate, pH 5.0, was mixed 
with a 1:1 ratio and set up in a hanging-drop vapour-diffusion format. The Ca”*- 
derivative crystals were obtained by soaking CayAb and other mutant crystals in a 
cryo-protection solution (0.1 M Na-acetate, pH 5.0, 26% glucose and 2.0 M ammo- 
nium sulphate) containing the indicated concentrations of Ca”* for 40-60 min at 
4°C. The Cd** and Mn*" derivatives were obtained by soaking CayAb in the 
presence of 100 mM Cd?* or Mn”*, respectively. Crystals were then plunged into 
liquid nitrogen and maintained at 100 K during all data collection procedures. 
All anomalous diffraction data sets were collected at 1.75 A with the same syn- 
chrotron radiation source (Advanced Light Source, BL8.2.1). To optimize the 
anomalous signal, the data sets were collected by using the ‘inverse beam strategy’ 
with the wedge size of 5°. 
Structure determination, refinement and analyses. X-ray diffraction data were 
integrated and scaled with the HKL2000 package™ and further processed with the 
CCP4 package”. The structure of Cay Ab and its derivatives were solved by mole- 
cular replacement by using an individual subunit of the NayAb structure (PDB 
code 3RVY) as the search template. The data sets were processed in C2 space 
group and there are four molecules in one asymmetric unit. We chose the 1222 
space group to process the data sets for initial structural determination, but we 
found that the bound ions were slightly off-centre with respect to the axis of the 
pore. Therefore, to better interpret the coordination of Ca?*, Cd?* and Mn?*, we 
solved the structures in the C2 space group. Crystallography and NMR System 
software* was used for refinement of coordinates and B-factors. Final models were 
obtained after several cycles of refinement with REFMAC” and PHENIX”® and 
manual re-building using COOT”. The geometries of the final structural models of 
CayAb and its derivatives were verified using PROCHECK™. The divalent cations 
were identified by anomalous difference Fourier maps calculated using data col- 
lected at wavelengths of 1.75 A for Ca?*, Cd?* and Mn**. Detailed crystallographic 
data and refinement statics for all the constructs are shown in Supplementary Table 1. 
All structural figures were prepared with PyMol". 
Electrophysiology. Wild-type NavAb expressed by infection of insect cells (High5) 
activates at very negative potentials (V,;2 ~ —98 mV) and shows a strong, late use- 
dependent phase of slow inactivation. Mutation N49K shifts the activation curve 
~75 mV to more positive potentials and abolishes the use-dependent inactivation”. 


All NayAb/CayAb constructs used were made on the background of N49K mutation 
and showed good expression, allowing measurement of ionic currents 24-48 h 
after infection. 

Whole-cell currents were recorded using an Axopatch 200 amplifier (Molecular 
Devices) with glass micropipettes (2-5 MQ). Capacitance was subtracted and 80- 
90% of series resistance was compensated using internal amplifier circuitry. For 
reversal potential measurements, the intracellular pipette solution contained (in 
mM): 100 NaF, 10 NaCl, 20 HEPES-Na, 10 EGTA, pH 7.4 (adjusted with NaOH, 
[Na* ]total = 146 mM). Extracellular solution contained in (mM) 10 CaCl, 140 NMDG- 
methanesulphonate, 20 HEPES, (pH 7.4, adjusted with Ca(OH), [Ca**]yotat = 12 mM). 
For Ba’~ reversal potential measurements, BaCl, replaced CaCl. Current-voltage 
(I-V) relationships were recorded in response to steps to voltages ranging from 
—100to +70 mV in 5- or 10-mV increments froma holding potential of —100 mV. 
Pulses were generated and currents were recorded using Pulse software control- 
ling an Instrutech ITC18 interface (HEKA). Data were analysed using Igor Pro 6.2 
(WaveMetrics). Sample sizes were chosen to give s.e.m. values of less than 10% of 
peak values based on prior experimental experience. 

Relative permeability values were calculated as described”'. The permeability 
ratio was calculated as: 


rtm {aufeso(®2)] far (2) +1] a 


In which F, R, T and E,ey are Faraday constant, gas constant, absolute temperature 
and reversal potential, respectively. a, denotes the activity of the external divalent 
ion, x, (Ca”* or Ba**) and ay the activity of intracellular sodium. The calculated 
activity coefficients were yc, = 0.33, pa = 0.30, Yna = 0.74. All potentials were 
corrected for the experimentally determined liquid junction potential. 

For anomalous mole fraction and blocking experiments, the divalent (Ca?*, 
Cd?* and Mn**) was diluted in 10 mM BaCh, 140 mM NMDG-methanesulpho- 
nate and 10mM HEPES and perfused for 2-3 min before recording a I-V curve. 
The peak value of the I-V curve was measured and normalized to the peak value 
without the divalent cation. 
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Strong neutrino cooling by cycles of electron capture 
and f decay in neutron star crusts 


H. Schatz>*?, S. Gupta’, P. Moller”°, M. Beard”®, E. F. Brown)”, A. T. Deibel?, L. R. Gasques’, W. R. Hix®”, L. Keek!??, 


R. Lau”? A. W. Steiner?!° & M. Wiescher”® 


The temperature in the crust of an accreting neutron star, which 
comprises its outermost kilometre, is set by heating from nuclear 
reactions at large densities’ *, neutrino cooling** and heat transport 
from the interior’. The heated crust has been thought to affect 
observable phenomena at shallower depths, such as thermonuclear 
bursts in the accreted envelope’®". Here we report that cycles of elec- 
tron capture and its inverse, # decay, involving neutron-rich nuclei 
at a typical depth of about 150 metres, cool the outer neutron star 
crust by emitting neutrinos while also thermally decoupling the sur- 
face layers from the deeper crust. This “‘Urca’ mechanism” has been 
studied in the context of white dwarfs’ and type Ia supernovae'*”’, 
but hitherto was not considered in neutron stars, because previous 
models’” computed the crust reactions using a zero-temperature 
approximation and assumed that only a single nuclear species was 
present at any given depth. The thermal decoupling means that X-ray 
bursts and other surface phenomena are largely independent of the 
strength of deep crustal heating. The unexpectedly short recurrence 
times, of the order of years, observed for very energetic thermo- 
nuclear superbursts” are therefore not an indicator ofa hot crust, but 
may point instead to an unknown local heating mechanism near the 
neutron star surface. 

Continual accretion onto a neutron star pushes the ashes of surface 
thermonuclear burning, which is often observed as type I X-ray bursts'”"*, 
to greater pressures and densities, at which the nuclei form a rigid lattice” 
known as the crust. With increasing depth, these ashes are transformed 
by capture of degenerate electrons into increasingly neutron-rich nuclei’. 
An electron-capture reaction—(Z, A) + e —>(Z— 1, A) + v.—involves 
a parent nucleus (Z, A) with charge number Z and mass number A and 
gives rise to a daughter nucleus (Z — 1, A) with the emission of an 
electron neutrino; this occurs at a well-defined depth, where the elec- 
tron chemical potential .~ |Qsc| + Ex. Here Quc is the (negative) 
electron-capture Q-value (the difference between the parent and daughter 
ground-state masses and hence the energy needed for the reaction to 
occur) and E, is the excitation energy of the lowest state in the daughter 
nucleus that can be populated by electron capture. In the commonly 
used zero-temperature approximation, the reverse 8 -decay reaction 
(Z—1,A)—>(Z,A) +e~ +7, is blocked because there is no phase space 
available in which to re-emit the captured electron. At finite tempera- 
ture and for E, < kT, however, $b decay via the re-emission of an elec- 
tron with an energy close to | Qgc| is not completely blocked. Asa result, 
the boundary between a layer containing nuclei (Z, A) and a deeper 
layer containing (Z — 1, A) isa shell with mixed composition spanning 
a range of electron chemical potential |Qrc| —kT Su, S|Quc| +kT 
that corresponds to a thickness of a few metres within the neutron star 
crust. Inside this shell, both electron capture and its inverse, Bb decay, 
occur (see Fig. 1). If these reactions cycle back-and-forth rapidly, the 


Composition: (Z, A) 


Urca shell: both 
(Z, A) and (Z - 1, A) 


Composition: (Z - 1, A) 


Figure 1 | Schematic nuclear energy-level diagrams for an electron-capture/ 
£# -decay pair. a, Illustration of compositional layers in the neutron star crust; 
b-d, energy level diagrams. In the shallow region above the Urca shell, where 
the nuclear composition has charge number Z and mass number A, (Z, A), 
the electron chemical potential 1, is less than |Qgc|, the energy threshold for 
electron capture, and electron capture is energetically blocked (b). In the deeper 
region below the Urca shell, 11, > |Qgc|: electron capture has therefore 
occurred, the composition consists of nuclei (Z — 1, A), and the degenerate 
electrons block the phase space for electron emission via f decay (d). In the 
Urca shell between these regions, ju, ~ |Qgc|. Asa result, both electron capture 
(EC) and f decay (f) are possible (c), and rapid cycling between the nuclei 
(Z, A) and (Z — 1, A) leads to a strong neutrino emissivity. 
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Table 1 | Electron-capture/f" -decay pairs with highest cooling rates 


Electron-capture/f -decay pair Density+ Chemical Luminosityt 
potential+ 

Parent Daughter* (10'°gcm-%) (MeV) (10°° ergs"? 
2°ug 2°nNa 4.79 13.3 24 

587; 55S¢, °5Ca 3.73 12.1 11 

31A| 31Mg 3.39 11.8 8.8 

383A 3ug 5.19 13.4 8.3 

567i 56Sc 5.57 13.8 3.5 

57Cr Sy 1.22 83 1.6 

S7y 577), 57S¢ 2.56 10.7 1.6 

3G; sy 6.82 14.7 0.97 
mmr 105y 3.12 11.2 0.92 
59MIn 59Cr 0.945 76 0.88 
103S, 103Rb 5.30 13.3 0.65 

26k 26Br 6.40 14.3 0.65 

S5Fe Sun 2.34 10.3 0.60 
Sun S56; 3.55 11.7 0.46 


* The listing of two electron-capture daughter isotopes means that two subsequent reaction pairs occur 
in the same layer. 
+ The transition always occurs at the specified electron chemical potential. The density, which is for a 
composition consisting of nuclei with a single mass number A, will only be approximate for an arbitrary 
composition. 
{The cooling luminosity L, scales with temperature T, local gravitational acceleration g, neutron star 
radius (in the local rest frame) R, and mass fraction X of the respective electron-capture/f -decay pair 
as L, x XR*g” 17°. The temperature scaling assumes E,<kT. For further details, see Supplementary 
Information section 1. The values for L, we quote here are for T=0.51 GK, g= 1.85 x 104cms-2, 
R=12km and X= 1. The existence of the *°Ti-°°Sc electron-capture/” -decay pair depends strongly 
on nuclear masses. In all other cases nuclear-physics-related uncertainties of the predicted 
luminosities are of the order of a factor of 3-4 (see Supplementary Information section 5). 


my 


result is a strong neutrino emission, known as an Urca process’’, that 
cools the neutron star crust. 

Such Urca shells are thought to also operate in white dwarfs”, type 
Ia supernovae'*”* and stellar ONeMg cores producing electron-capture 


Charge number 
—_> 


LETTER 


supernovae~’. But the effect has not been considered in the context of 
accreting neutron stars. Most earlier models of accreted crusts’** were 
computed in the zero-temperature limit, in which the composition 
switches sharply at the energetic thresholds with no available phase 
space for cycling; indeed, in this limit, only one nuclide of a reaction 
pair is present at a given depth. Urca shell cooling relies on phase space 
unblocking at finite temperature and the presence of both reaction pair 
nuclides in the shell. More recent reaction network calculations’ did 
not include f -decays as they were not considered to be important, 
and any Urca cycling was estimated to be negligible. The importance of 
Urca shell cooling is revealed here through the use of a full reaction 
network that includes both electron capture and Bf decay on an equal 
footing, that takes into account the rates of subsequent reactions that 
deplete the electron-capture//s~ -decay pairs, and that follows the evolu- 
tion in time of a fluid element as it is pushed through a reaction shell. 

In order for this cycling of electron capture and f decays between 
two nuclear species to occur, the nuclei involved must satisfy two con- 
ditions. First, the transitions must proceed between low-lying states 
(Ex SkT is required for both the electron capture and the f decay). In 
addition, within an electron-capture//}" -decay pair, the nucleus under- 
going B” decay must not havea strong electron-capture branch, as these 
electron captures would remove nuclei from the Urca cycle, thereby 
reducing its effect or eliminating it entirely. The cooling rate depends 
on the strength of the transition (the ft value, which is proportional to 
the matrix element connecting the parent and daughter states) and the 
energy threshold; the integration over the available phase space pro- 
duces a characteristic T° scaling with temperature’’. The formation of 
Urca shells with large cooling rates is enabled by strong nuclear defor- 
mations that tend to spread nuclear electron-capture strength to lower 
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Figure 2 | Electron-capture/f” -decay pairs on a chart of the nuclides. The 
thick blue lines denote electron-capture/f -decay pairs that would generate a 
strong neutrino luminosity in excess of 5 X 10° ergs | at T= 0.51 GK for a 
composition consisting entirely of the respective electron-capture/f” -decay 
pair. They largely coincide with regions where allowed electron-capture 

and f -decay transitions are predicted to populate low-lying states and 
subsequent electron capture is blocked (shaded squares, see also the discussion 
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in ref. 3). These are mostly regions between the closed neutron and proton 
shells (pairs of horizontal and vertical red lines), where nuclei are significantly 
deformed (see Supplementary Information section 4). Nuclides that are 

Bb -stable under terrestrial conditions are shown as squares bordered by 
thicker lines. Nuclear charge numbers are indicated in parentheses next to 
element symbols. 
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excited states, thereby lowering E, (ref. 21; see Extended Data Fig. 1). 
There are a number of electron-capture/f” -decay pairs that fulfil these 
conditions for forming fast-cooling Urca shells (see Table 1 and Fig. 2). 
The degree to which these shells are activated in a neutron star crust 
depends on the initial composition produced by thermonuclear burn- 
ing on the neutron star surface. Because electron capture in the outer 
crust preserves the mass number A, the abundance of an electron- 
capture/f -decay pair, and therefore its absolute neutrino luminosity, 
is set by the abundance of nuclei with the same mass number in the 
ashes of the surface thermonuclear burning. As is evident from Fig. 3, 
neutrino cooling by Urca shells is by far the dominant neutrino emis- 
sion process in the crust for typical crust compositions. 

The greatly enhanced crust neutrino emissivity at rather shallow depths 
changes the long-standing assumption that rapidly accreting neutron 
stars have a significant luminosity from deep crustal heating, which 
directly influences thermonuclear burning in their accreted envelopes. 
In the absence of crust Urca shells, if the crust has a low thermal con- 
ductivity, and ifthe core neutrino emissivity is weak, then deep crustal 
heating would generate significant heat flow towards the neutron star 
surface”. Models of thermonuclear bursts'*’” use this emergent lumi- 
nosity from the deep crust as a boundary condition, which sets in part 
the ignition density and temperature. The presence of strongly tem- 
perature-sensitive Urca cycles limits the temperature at the location of 
the shells, however, and may even require an inward directed lumi- 
nosity from the accreted envelope. Even for conditions in the deep 
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Figure 3 | Neutrino luminosities in the accreted neutron star crust. For 
purposes of comparison, estimates of the neutrino luminosities were obtained 
by integrating the corresponding emissivities over a neutron star crust with a 
constant local temperature, following the methodology of ref. 10. Although 
an actual neutron star crust is not isothermal, the temperature variation across 
the crust is not large (typically less than a factor of 2). The Urca-shell 
luminosities were calculated for superburst ashes”® (solid red line), X-ray burst 
ashes produced by the rapid proton-capture (rp) process'” (dashed red line), 
anda pure A = 29 composition (dotted red line) to demonstrate the maximum 
effect. Also shown are the neutrino luminosities from electron-nucleus 
bremsstrahlung (blue dot-dashed line) and plasmon decay (blue dashed line). 
For comparison, we show estimates for the total crustal heating (shaded band). 
This is based on a local heating rate of (1.9 MeV/nucleon) x M for a range 
of accretion rates 10'°gs~! SM <10!8gs~!, which is representative for 
observed neutron stars. Urca shells dominate the neutrino luminosity from 
the crust and can balance the crust heating at moderate temperatures 

2x 10° K. The temperature scaling assumes Ey<kT (see Supplementary 
Information section 1). 
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crust that are favourable for sending a large heat flux to the surface, the 
Urca shells re-emit this heat as neutrinos, thereby preventing it from 
reaching the surface layers (see Supplementary Information section 3). 

To establish the robustness of our conclusions with respect to nuc- 
lear physics uncertainties, we used two different mass models, namely 
FRDM” and HFB-21™. The use of FRDM masses instead of those 
from HFB-21 reduces the Urca-shell neutrino luminosity by 90% for 
a superburst ash composition (see Supplementary Information section 
5 for details). For both mass models, the temperature at the superburst 
ignition depth is <5 X 10°K if the Urca shell neutrino emissivity is 
included; and for both mass models the temperature has a significant 
local minimum at the location of the Urca shell (see Extended Data Fig. 2). 

This has important implications for the ignition of superbursts, which 
are thought to be triggered by the unstable thermonuclear reaction 
2c + ?C. The observed recurrence times, of the order of one year, are 
much shorter than predictions of current models”*”°, which indicates 
that the temperature at the ignition depth is underestimated. The pre- 
sence of Urca shells implies that this observation does not point to an 
unexpectedly hot crust*’°. Because of the Urca shells, we conclude that 
the standard carbon-ignition scenario for superbursts requires a power- 
ful, as yet unknown, heat source that operates at surprisingly shallow 
densities <10'° gcm * very near the carbon ignition layer. Alterna- 
tively other, more exotic, mechanisms would have to be found to ignite 
and power superbursts. 

Urca shell cooling therefore forces fundamental changes in current 
superburst models. Realistic ignition conditions must now include a 
strong localized heat source at a depth close to that of ignition, with a 
strong heat flux flowing inwards into the Urca shell cooling layer. The 
resulting temperature profile at ignition is therefore dramatically dif- 
ferent from that assumed previously, which may alter predicted light 
curves. In addition, during the explosion, the temperature at the igni- 
tion depth rises to ~ 10° K. Heat from this layer will diffuse inward; on 
the basis of the thermal diffusion timescale in the neutron star crust?°, 
we estimate that in the absence of any neutrino emission, the tempera- 
ture would rise to ~10” K at the depth of the Urca shell within a day 
following ignition. The presence of a strong ‘heat sink’ at that depth, 
however, will prevent the deeper layers from rising in temperature and 
will, therefore, force the observed light curve to decay faster than expected 
over timescales of roughly one day. Current superburst observations 
on this timescale are rare and provide data of limited quality’®, but a 
dedicated programme of superburst follow-up observations with cur- 
rent instrumentation could address this problem. Detailed simulations, 
which are beyond the scope of this Letter, are required to quantify the 
effect of Urca shell cooling on superburst light curves. 

Another observational signature might be found in neutron star cool- 
ing following an accretion outburst. Unlike crustal heating, the rate of 
Urca shell cooling does not scale with accretion rate, but rather depends 
only on temperature. Cooling will therefore continue in transiently 
accreting neutron stars once accretion has ceased, and might affect 
observations of the cooling crusts in the hottest of these systems, such 
as XTEJ1701-462”’; in such systems, the Urca shell neutrino cooling 
rate is comparable to typical photon luminosities of 10°°-10** ergs * 
for typical initial crust temperatures of (1-3) x 10°K. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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Extended Data Figure 1 | Calculated proton and neutron single-particle 
energy levels in ‘Zr as functions of nuclear deformation. Left panel, proton 
levels; right panel, neutron levels. The 40 protons and 65 neutrons in 1057 fill 
all levels up to the Fermi levels corresponding to these nucleon numbers 

in the two diagrams (red dots). Levels corresponding to even parity are shown 
as solid lines, those corresponding to odd parity as dashed lines. Shell gaps are 
characterized by a particularly large separation in energy between two 
adjacent single-particle levels. The numbers of protons or neutrons that occupy 
levels up to the shell gap are indicated by circled numbers. The single-particle 
levels are shown for a spherical nucleus in spectroscopic standard notation 
(left side of each panel), and for a deformation near the calculated ground-state 


28. Nilsson, S. G. Binding state of individual nucleons in strongly deformed nuclei. 
Mat.-fys. Meddr. 29, 1-69 (1955). 


Hexadecapole Deformation e4 
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 


—s 
Sd 


-10 


Single-Neutron Energy (MeV) 


-15 


Spheroidal Deformation e2 


shape of '°°Zr with quadrupole and hexadecapole shape-parameter values 

&) = 0.333 and &4 = 0.06, respectively” (right side of each panel). The middle 
section of each panel shows the change in level energies as ¢, and ¢4 change from 
spherical values ¢, = ¢4 = 0 to deformed values”®. The well-known “magic 
numbers” 50 and 82 corresponding to particularly large gaps stand out at zero 
deformation’®. When the nuclear shape becomes deformed, the spherical shell 
gaps disappear resulting in a large density of levels in the vicinity of the 
Fermi level. This gives rise to a large number of states at low excitation in !°°Zr. 
Some of these states can be populated by strong f” decay transitions from 
the ground state of '”Y. The situation is similar for the electron capture on 
1057, into deformed '°Y. 
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Extended Data Figure 2 | Temperature asa function of depth in the accreted 
neutron star crust for different Urca shell cooling strengths. Here we use 
P/g~ |p dz asa proxy for depth, where P is the pressure, g the local 
gravitational acceleration, p the mass density and z the spatial depth coordinate. 
As a baseline model, we fix the temperature to be T = 0.42 GK at 
Pig=10°gcm * and T = 0.35 GK at the crust-core transition. In the absence 
of Urca shell cooling, the peak local temperature reaches 0.73 GK (solid curve) 
with the temperature at the superburst ignition depth (P/g~ 102 gcm ”) 
being 0.66 GK. With the addition of cooling using the HFB-21 mass model and 
a superburst ash composition (blue dotted line), a local temperature minimum, 
T = 0.33 GK, appears at the location of the Urca shell. Indeed, for these 
conditions the temperature at the Urca shell is lower than that at the upper 
boundary, so that a temperature inversion develops. Even for the much lower 
Urca shell emissivity of the FRDM mass model (blue dashed line), the 
temperature at the depth of the superburst ignition is <5 x 10° K, which is 
inconsistent with typical superburst ignition conditions’®. For both mass 
models, the temperature has a local minimum at the location of the Urca shell. 
The steady-state cooling luminosity from the shell is 2 x 10°° ergs * for the 
HFB-21 mass model and 1.4 X 10°’ ergs | for the FRDM mass model. As a 
result, the Urca shell thermally decouples the envelope of light elements from 
the heating in the deeper crust. 
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A featureless transmission spectrum for the 
Neptune-mass exoplanet GJ 436b 


Heather A. Knutson!, Bjorn Benneke’?, Drake Deming? & Derek Homeier* 


GJ 436b is a warm—approximately 800 kelvin—exoplanet that peri- 
odically eclipses its low-mass (half the mass of the Sun) host star, and 
is one of the few Neptune-mass planets that is amenable to detailed 
characterization. Previous observations'~* have indicated that its atmo- 
sphere has a ratio of methane to carbon monoxide that is 10° times 
smaller than predicted by models for hydrogen-dominated atmo- 
spheres at these temperatures**. A recent study proposed that this 
unusual chemistry could be explained if the planet’s atmosphere is 
significantly enhanced in elements heavier than hydrogen and helium*. 
Here we report observations of GJ 436b’s atmosphere obtained dur- 
ing transit. The data indicate that the planet’s transmission spectrum 
is featureless, ruling out cloud-free, hydrogen-dominated atmosphere 
models with an extremely high significance of 48¢. The measured 
spectrum is consistent with either a layer of high cloud located at 
a pressure level of approximately one millibar or with a relatively 
hydrogen-poor (three per cent hydrogen and helium mass fraction) 
atmospheric composition’~°. 

We observed four transits of the Neptune-mass planet GJ 436b on 
26 October 2012 Universal Time (UT), 29 November 2012, 10 December 
2012 and 2 January 2013 using the red grism (1.2-1.6 jm) on the Hubble 
Space Telescope (HST) Wide Field Camera 3 instrument. These data 
were obtained in a new scanning mode’"’ with a scan rate of 0.99"’ per 
second, which allowed us to achieve approximately a factor-of-twenty 
improvement in the orbit-averaged efficiency compared to staring-mode 
observations’’. Each visit spanned four HST orbits with an integration 
time of 7.6 s per exposure. We extracted the spectra from the raw images 
using the template-fitting technique described in a previous study’, 
and provide additional details of our reduction in the Methods. 

We fitted the four wavelength-integrated (white-light) transit curves 
simultaneously’ while accounting for detector effects (see discussion 
in the Methods), to determine values for the planet’s orbital inclination 
i, the planet-star radius ratio R,/R+, the ratio of the planet’s semi-major 
axis a to the stellar radius R», and the centre-of-transit time T.. We set 
the uncertainties on each measurement equal to the standard deviation 
of the residuals from our best-fit solution for that visit and evaluated 
the uncertainties on our best-fit parameters using the covariance matrix 
from our Levenberg-Marquardt least-squares minimization, a Markov 
Chain Monte Carlo analysis, and a residual permutation technique that 
better accounts for the presence of time-correlated noise in the data’*»’. 
The residual permutation approach results in uncertainties that are a 
factor of 1.5-2 larger than both the covariance matrix and the Markov 
Chain Monte Carlo errors for all of our fitting parameters, and we took 
those to be our final errors. 

Our best-fit parameters for the white-light transits are given in 
Table 1, and the normalized transit light curves are shown in Fig. 1. 
We see no evidence for transit depth variations comparable to those 
reported with the Spitzer data, and we do not detect any star spot 
occultations in our transit light curves that are similar to those observed 
for HD 189733b (refs 16 and 17). We next determined the differential 
wavelength-dependent transit depths in twenty-eight bins spanning 


wavelengths between 1.14 j1m and 1.65 jm, as described in Fig. 2 and 
the Methods. The resulting transmission spectrum is shown in Fig. 2, 
with error bars that include both the uncertainties in the measured 
transit depth and in the stellar limb-darkening models. 

We interpreted the transmission spectrum using a variation of the 
Bayesian atmospheric retrieval framework described in a previous study’*; 
we provide a summary of this approach in the Methods. We find that 
we obtain the best match to our data using models with either high- 
altitude clouds or a relatively hydrogen-poor atmosphere with a reduced 
scale height and correspondingly small absorption features. At low 
metallicities our model requires a haze or cloud layer at pressures below 
10 mbar, because the large scale height of these models otherwise leads 
to strong spectral signatures from molecular absorption (for example, 
H,0, CHy, CO or CO). At higher metallicities the scale height of the 
atmosphere is reduced, and no clouds are needed to produce an effec- 
tively flat transmission spectrum. Our conclusions are similar to those 
obtained for the transiting super-Earth GJ 1214b (ref. 12), although in 
this case new upper limits on the planet’s transmission spectrum indicate 
that it must have a high cloud layer even if the atmosphere is metal- 
rich’’. GJ 436b is four times more massive with a nearly identical average 
density and therefore seems a less obvious candidate for a hydrogen- 
poor atmosphere. The 68% (10) Bayesian credible region (confidence 
region) extends along a curve from hydrogen-dominated models with 
a high-altitude cloud layer between 0.01 mbar and 4 mbar to high- 
metallicity models that may or may not contain clouds. 

For atmospheres reflecting the Sun’s bulk composition (primarily 
hydrogen and helium with small amounts of heavier elements), a cloud 
or haze layer at 1 mbar that is optically thick for slant viewing geomet- 
ries represents the best fit to the data. Zinc sulphide and potassium 
chloride are both plausible candidates for the composition of the cloud, 
because the condensation curves of these substances can readily cross 
the pressure-temperature profile at the millibar level in GJ 436b’s atmo- 
sphere®”®. A recent study” of the super-Earth GJ 1214b, which is also a 


Table 1 | Best-fit 1c transit parameters from white-light curves 


Visit date R,/Re Te (BUDtpe) 
26 October 2012 uT 0.08349(33) 2456226.69131(11) 
29 November 2012 ut 0.08413(26) 2456261.06211(10) 
10 December 2012 ut 0.08372(31) 245627 1.63758(10) 
2 January 2013 ut 0.08310(27) 2456295.43270(7) 
Average inclination, / (°) 86.774(30) 
Average a/R» 14.41(10) 


We find that our estimates (averaged over all dates) for the orbital inclination / and the ratio of the semi- 
major axis a to the stellar radius Rs are consistent at the 3a level with our previously published 8-u~m 
Spitzer transit observations®, and that our individual values for the planet-star radius ratio Rp/R+ are 
mutually consistent at the 3c level. We derive a new transit ephemeris with a centre-of-transit time 

To = 2456295.431924(45) BJDrpg (where To is the best-fit zero point for the linear ephemeris fit) and 
an orbital period of P= 2.64389782(8) days by combining our data with previous studies; our new 
data extend the current baseline for this object by almost 4 years. Errors are given in parentheses after 
the best-fit value for the orbital period, with the number N of significant digits in the quoted errors 
corresponding to the last N significant digits of the best-fit value. For example, the stated orbital period 
corresponds to a value of 2.64389785 + 0.00000008 days using standard notation. BJD, barycentric 
Julian date for the measured centre-of-transit times. To convert from TDB to Coordinated Universal 
Time (utc) time standards, simply subtract 67.185 from the reported centre-of-transit times, Tc. 
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Figure 1 | White-light transit curves for the four individual visits. Data are 
vertically offset for clarity. Transits were observed on the following dates (from 
top to bottom): 26 October 2012 ut, 29 November 2012 uT, 10 December 2012 
ut and 2 January 2013 ur. Normalized data with the first orbit trimmed and 
instrumental effects removed are shown as black filled circles. Best-fit model 
transit light curves are shown as black lines. The data consist of three spacecraft 
orbits with durations of approximately 1.5 hours each; there is a gap during 
each orbit where the spacecraft passes behind the Earth and the target is no 
longer visible. 


good candidate for having these cloud species, indicates that a solar- 
composition atmosphere would not have sufficient amounts of con- 
densable material to form optically thick clouds. If the atmospheric 
metallicity (defined as the abundances of elements heavier than hydro- 
gen and helium, H and He) is enhanced above this level, then such 
clouds could easily explain this planet’s flat transmission spectrum”®. 
Alternatively, photochemical haze production could lead to an opaque 
cloud layer at the millibar level, although these models probably also 
require an enhanced atmospheric metallicity. 
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Figure 2 | Averaged transmission spectrum for GJ 436b. Black filled circles 
indicate the error-weighted mean transit depth in each bandpass, with the 
plotted uncertainties calculated as the sum in quadrature of the 1o standard 
deviation measurement errors and the systematic uncertainties from stellar 
limb-darkening models. We show three models for comparison, including a 
solar-metallicity cloud-free model (red line), a hydrogen-poor 1,900-times- 
solar model (blue line) and a solar-metallicity model with optically thick clouds 
at 1 mbar (green line). 
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Figure 3 | Joint constraints on cloud-top pressure versus atmospheric 
metallicity. The coloured shading indicates the normalized probability density 
as a function of metallicity (defined as the relative abundance of elements 
heavier than H and He) and cloud top pressure derived using our Bayesian 
atmosphere retrieval framework'*”°. The black contours show the 68%, 95% 
and 99.7% Bayesian credible regions. We rule out a hydrogen-dominated, 
cloud-free, solar-metallicity atmosphere with a significance of 480. Colour- 
matching markers indicate the three models plotted in Fig. 2, and vertical 
dashed lines indicate constraints on the planet’s composition from 
measurements of its average density. 


Previous studies have placed constraints on the possible bulk com- 
positions for GJ 436b using the planet’s measured mass and radius, the 
estimated age of the system, and models of planet formation and 
migration’. A recent survey of the published literature for this 
planet indicated that current models are consistent with bulk metalli- 
cities between 230 and 2,000 times solar, depending on the assumed 
ratio of rock to ice, the distribution of metals between the core and the 
envelope, the interior temperature of the planet, and other related 
factors®. This corresponds to a H/He mass fraction of approximately 
3%-22%. On the basis of this analysis we conclude that an atmospheric 
metallicity of 1,900 times solar is consistent with current constraints 
for the planet’s bulk composition, although it is very close to the upper 
end of this range. Mass loss appears to be minimal for GJ 436b under 
present conditions**”®, but it is possible that the higher ultraviolet and 
X-ray fluxes expected for young stars could have resulted in the loss of 
some atmospheric hydrogen very early in the planet’s history”””’. 
Although a recent study” argued that such mass loss is unlikely to result 
in significant depletion of hydrogen relative to other elements, addi- 
tional modelling work is still needed to provide a more definitive 
resolution to this question. 

There are several potential ways to distinguish between cloudy and 
high-atmospheric-metallicity scenarios for GJ 436b’s atmosphere. An 
unambiguous solution would be to obtain more precise, moderate- 
resolution transmission spectroscopy” capable of detecting near-infrared 
absorption features and directly constraining the mean molecular weight. 
We emphasize, however, that the apparent variations in the planet’s 
measured transit depth from one epoch to the next (see Methods) make 
simultaneous measurements essential for robust constraints on this 
planet’s transmission spectrum. A hydrogen-dominated atmosphere 
with a high cloud or haze layer should exhibit attenuated water absorp- 
tion features, which could be distinguished from the intrinsically weaker 
features of high-metallicity atmospheres on the basis of the steepness of 
the wings of the absorption lines. Similarly, a detection of the Rayleigh 
scattering slope in the planet’s visible-light transmission spectrum could 
directly constrain both the mean molecular weight and the amount of 
spectrally inactive gas (that is, H and He or haze particles) present in 
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the atmosphere”. Alternatively, one could differentiate between 
hydrogen-dominated and high-metallicity atmospheres by measuring 
the relative abundances of CO, CO;, methane and water. These relative 
abundances could then be compared to different chemical models for 
GJ 436b’s atmosphere that might rule out the presence of significant 
molecular hydrogen. Improved constraints on the atmospheric chem- 
istry from secondary eclipse spectroscopy”, which is less sensitive to 
high-altitude clouds and stellar activity, could also help to restrict the 
range of plausible atmospheric compositions in the limit of a well- 
mixed atmosphere (that is, no significant compositional gradients 
between the day and night sides). Lastly, improved estimates for the 
stellar mass and radius would help to reduce the uncertainties in the 
corresponding planetary values and hence better constrain its mean 
density. 


METHODS SUMMARY 


We calculate this transmission spectrum as follows: first we determine the differ- 
ence between our extracted spectrum and a best-fit template spectrum at each 
pixel position and create a time series of the residuals. We then fit this time series 
with a model consisting of the difference between the white-light transit curve and 
a transit light curve with a freely varying planet-star radius ratio. We also include a 
linear function of time to account for the first order of any remaining instrumental 
trends. We compare the errors on the planet-star radius ratio from the Levenberg- 
Marquardt covariance matrix and the residual permutation method and take the 
larger of the two as our final uncertainty; they typically agree to within 10%. We 
then average the planet-star radius ratios in four-pixel-wide segments to create 
our final transmission spectrum for each visit, in which we select our wavelength 
range to exclude the low-illumination regions at the edges of the spectrum. We 
calculate uncertainties on each bin as the average of the errors for the four indi- 
vidual radius ratios to account for the four-pixel-wide Gaussian smoothing func- 
tion we applied to the raw spectra before fitting the template spectrum. We combine 
the data from our four visits by taking the error-weighted mean of the transit 
depths in each wavelength bin. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 


Spectral extraction. We follow the method described in a previous study'' and 
summarize the main steps here for reference. We obtain our data using the 256 X 256 
pixel subarray and the SPARS10 mode with two samples (see section 13.3.6 of the 
HST Phase II Proposal Instructions for Cycle 21, available at http://www.stsci.edu/ 
ftp/documents/p2pi/p2pi.pdf), which has an effective integration time of 7.62 s. 
These data are available for download from the Barbara A. Mikulski Archive for 
Space Telescopes (MAST) archive as part of proposal number 11622. We extract our 
spectra from the raw _ima.fits image files (see Extended Data Fig. 1 for a repres- 
entative image), because the method for calculating the fluxes for the _flt.fits files 
does not work for data obtained in drift scan mode. The _ima.fits files were pro- 
cessed using either version 2.7 (all 2012 transits) or version 3.0 (Jan. 2013 transit) of 
the CALWF3 pipeline, which applies a standard set of calibrations including dark 
subtraction, linearity correction, cosmic ray rejection, and a conversion from raw 
counts to flux units as described in section 3.2.3 of the WFC3 Data Handbook. 

The _ima-fits files retrieved through MAST contain an array of three images 
produced by the sample-up-the-ramp readout. These images were taken 0 s, 0.28 s 
and 7.62 s after the start of the exposure, and we refer to them as images A, B and C, 
respectively. We convert each image from units of electrons per second to elec- 
trons using the appropriate integration times and difference each pair of sampled 
images (B — A and C — B). We then trim out a central region encompassing the 
location of the spectrum in each differenced image and add the trimmed differ- 
enced images together (that is, (B — A) + (C — B)) to create our final science image 
(the working image that we use to extract the spectrum). Because we use a different 
sub-aperture for each differenced image, our final combined (B — A) + (C — B) 
science image is not simply equal to the (C — A) image. Previous studies"! used this 
differenced image approach to minimize contamination from the sky background 
in the scanned images, therefore avoiding the need for a separate background 
subtraction step. Although we adopt the same approach in this study, our mask 
excludes only the sky background from the first 0.28 s of the 7.62 s integration and 
we therefore include a separate sky-subtraction step later in our analysis. We find 
that for our data this image differencing approach gives results that are identical to 
the case where we simply use the third (C) sample-up-the-ramp image as our 
science image with no subtraction or masking. 

We next select a sub-aperture centred on the position of the stellar spectrum in 
our science image with dimensions of 160 pixels in the x (dispersion) direction and 
71 pixels in the y (cross-dispersion) direction. We use a fixed aperture position for 
each visit, and estimate the position of the star using an acquisition image obtained 
at the start of each visit. Unlike previous studies'', we find that using a narrow 
aperture that cuts off at half the maximum flux produces an increased scatter in 
our white-light photometry; this may be due to the larger orbit-to-orbit position 
drift in our images compared to HD 204958b. In this case we obtain optimal results 
with an aperture that extends out to the wings of the point spread function in both 
dimensions. 

We apply a colour-dependent flat-field correction and calculate wavelength 
solutions for our differenced images using coefficients adapted from the standard 
Space Telescope Science Institute pipeline as described in other studies (ref. 11 and 
Wilkins, A. N. et al., personal communication). We then apply a filter to remove 
bad pixels and cosmic ray hits by first dividing each row of the individual subarrays 
by the total flux in that row (this corrects for the uneven scan rate in the y direction) 
and then iteratively flagging 80 and then 4c outliers in the time series at each pixel 
position using a moving median filter with a width of five pixels (that is, we 
calculate the median flux value at that pixel position starting from two images 
before and ending two images after our science image). We replace flagged pixels 
with the value of the moving median filter at that position, then multiply each row 
by the original flux total from that row to restore the initial subarray with bad pixels 
removed. Approximately 0.04%-0.06% of the pixels within the subarray aperture 
are flagged as bad in our four visits. We then sum in the y (cross-dispersion) direction 
to create a one-dimensional spectrum from each image. 

We calculate the MJD mid-exposure times corresponding to each spectrum 
from the headers of the -flt files, and convert these times to BJDyp x using publicly 
available routines*’. The median sky background in our _ima.fits images is approxi- 
mately 0.1% of the total flux when we sum over the spectrum. We see no evidence 
for any wavelength or time dependence in the background flux, and so simply 
subtract the median background level from each visit. 

As noted ina previous study", the WFC3 spectra are undersampled and this can 
create problems when fitting templates with slightly offset positions in the disper- 
sion direction. We mitigate this issue by convolving all of our one-dimensional 
spectra with a Gaussian function with a full width at half maximum (FWHM) of 
four pixels; this slightly degrades the wavelength resolution of our spectra, but the 
loss is negligible because we ultimately bin our transmission spectrum in four- 
pixel-wide bins. We next create a template spectrum for each visit by averaging ten 
spectra immediately before and immediately after the transit. We fit the template 


LETTER 


spectrum to the central 112 pixels of the individual spectra from each visit, allow- 
ing the template amplitude to vary freely and the relative positions to shift in 
increments of a thousandth of a pixel. 

Transit fits. The template-fitting technique results in two kinds of data product: 
first, a white-light curve for each transit calculated from the best-fit amplitude for 
the template spectrum at each time step, and second, a set of wavelength-dependent 
time series calculated from the difference of the best-fit template spectra and the 
measured spectra at each pixel position. This method is designed to remove common- 
mode white-light instrumental effects from the differenced spectra without the 
need to assume a functional form for these effects, resulting in lower noise levels in 
the final transmission spectrum than other commonly used approaches'***”?, 

Our fits to the white-light transit photometry (Fig. 1 and Extended Data Figs 2 
and 3) include a linear function of time and a linear-plus-exponential function of 
orbital phase (five free parameters) to describe the behaviour of the instrument. 
We assume that i and a/R. are the same in all visits, but allow R,/R+, T. and the 
instrumental terms to vary individually. We trim the data from the first orbit in our 
fits to the white-light data, because these data display larger-than-usual instru- 
mental effects owing to settling at the new pointing. We keep this first orbit when 
we fit the differential transmission spectra (Extended Data Fig. 4) because there 
does not appear to be any evidence for colour-dependent instrumental effects at 
the start of each light curve and this gives us a longer baseline for our residual 
permutation error estimation. We set the uncertainties on each white-light mea- 
surement equal to the standard deviations of the residuals from our best-fit solu- 
tion for that visit, which are 10.0 X 10~> for the October, 9.1 X 10~° for the 
November, 8.9 X 10~° for the December and 7.8 X 10° for the January visits, 
respectively. These residuals are a factor of 1.2-1.5 times higher than the white- 
light photon noise limit of 6.4 X 107°, reflecting the uncorrected instrumental 
effects visible in Extended Data Fig. 3. The z° value for our simultaneous fit to 
all four white-light transit curves is 356.7, with a total of 360 points in our fit and 26 
free parameters. We also compare the root mean square of the residuals in our 
four-pixel bands to the photon noise limit in those bands and find median values 
ranging between 1.03-1.07 times the photon noise limit for our four individual 
transit observations. We calculate our errors on the wavelength-dependent transit 
depths. 

We show our best-fit transit times in comparison to previously published values 
in Extended Data Fig. 5. Although we do not expect the planet radius to vary in 
time, previous studies have reported variations in the measured transit depths at 
different epochs, which could be caused by the occultation of bright or dark 
regions on the stellar surface***. We show the best-fit transit depths from our four 
white-light curves in comparison to these previous studies in Extended Data Fig. 6. 
In Extended Data Fig. 7 we plot the Cai H and K activity index for this star as a 
function of time; although sampling is poor at the epoch of our HST observations, 
the star appears to have an average-to-low activity level at this time. This may 
explain the relatively small scatter in our measured transit depths over the two 
months spanned by our four transit observations compared to previous Spitzer 
observations. We calculate a reduced 7” value of 2.7 for our four transit depths 
compared to the averaged value, suggesting that stellar activity may still be con- 
tributing some extra variability. 

For the wavelength-dependent transit fit, we calculate the transmission spec- 
trum as follows: first we determine the difference between our extracted spectrum 
and a best-fit template spectrum at each pixel position and create a time series of 
the residuals. We then fit this time series with a model consisting of the difference 
between the white-light transit curve and a transit light curve with a freely varying 
planet-star radius ratio. We also include a linear function of time to account for 
the first order of any remaining instrumental trends. We compare the errors on the 
planet-star radius ratio from the Levenberg-Marquardt covariance matrix and the 
residual permutation method and take the larger of the two as our final uncer- 
tainty; they typically agree to within 10%. We then average the planet-star radius 
ratios in four-pixel-wide segments to create our final transmission spectrum for 
each visit (Extended Data Fig. 4), in which we select our wavelength range to 
exclude the low-illumination regions at the edges of the spectrum. We calculate 
uncertainties on each bin as the average of the errors for the four individual radius 
ratios to account for the four-pixel-wide Gaussian smoothing function we applied 
to the raw spectra before fitting the template spectrum. We combine the data from 
our four visits by taking the error-weighted mean of the transit depths in each 
wavelength bin (Fig. 2). 

Limb-darkening models. We compare results for both our white-light fits and 
our differential transmission spectra using fixed four-parameter nonlinear limb- 
darkening coefficients calculated from a PHOENIX (http://phoenix.astro.physik. 
uni-goettingen.de/) stellar atmosphere model”*. We first calculate the average centre- 
to-limb intensity profile for the nominal wavelength range of each individual pixel, 
then convolve the resulting model spectrum at each radial position on the star 
with a four-pixel-wide Gaussian function in wavelength space to account for the 
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smoothing applied to our measured spectra. We then fit the smoothed intensity 
profiles at each pixel position with a four-parameter nonlinear limb-darkening 
profile’® and use those limb-darkening coefficients to calculate our transit light 
curves. 

A recent study’ estimated an effective temperature of 3,416 + 54K for GJ 436 
based on new interferometric radius measurements. We therefore consider four 
different stellar atmosphere models with effective temperatures ranging between 
3,350 K and 3,500 K. We show plots of the disk-integrated spectra for the hottest 
and coldest models compared to the measured spectra for each visit in the WFC3 
band in Extended Data Fig. 8. Our choice of limb-darkening model has a relatively 
small effect on the overall shape of our measured transmission spectrum, and we 
quantify this effect as a systematic error term in Extended Data Table 1. We estimate 
the contribution of the limb-darkening model errors by calculating the change in 
the measured transit depth in a given band over a range of 3,350-3,450 K in the 
stellar effective temperature used for the limb-darkening models. We then add 
these errors in quadrature to the measurement errors when comparing our results 
to model transmission spectra, and show the combined errors in Fig. 2. 

We also compare the fits to the white-light curves using different stellar atmo- 

sphere models and find a f value of 357.2 for the 3,350 K model, 356.7 for the 
3,400 K model, 357.2 for the 3,450 K model and 357.6 for the 3,500 K model, an 
effectively negligible difference. Without a strong preference for one model over 
the other, we elect to use the 3,400 K model in our final analysis for consistency with 
the published temperature estimate*’. We tried fits with a linear limb-darkening 
coefficient as a free parameter at each wavelength, where we constrained these 
coefficients to vary within the range spanned by the model coefficients for stellar 
effective temperatures between 3,350 K and 3,450 K. We obtained a transmission 
spectrum that was consistent with our previous results, but with significantly larger 
uncertainties. This may be due to our choice of a linear parameterization for limb 
darkening, which provides a quantifiably poorer fit to the white-light transit curves, 
or to weak constraints on the limb-darkening profile due to GJ 436b’s near-grazing 
geometry (impact parameter b = 0.85; see our previous study of this planet* for a 
more detailed discussion of this geometry and its effect on our ability to empirically 
constrain limb-darkening profiles). 
Atmospheric retrieval. The observed transmission spectrum is interpreted using 
a variant of the atmospheric retrieval method described in previous studies'***. 
The method used in this work combines a self-consistent, line-by-line atmospheric 
forward model and the nested sampling technique to efficiently compute the joint 
posterior probability distribution of the desired atmospheric parameters from the 
observed transmission spectrum. The main variation from the method described 
in our most recent paper”? is that the analysis in this work employs our a priori 
knowledge of chemistry to limit the range of atmospheric compositions to scen- 
arios that are chemically plausible. 

The goal of the retrieval analysis is to determine the range of metallicities (Fe/H) 
and cloud top pressures that are in agreement with the data (Fig. 3). Rather than 
fitting the data with unconstrained combinations of molecular abundances, however, 
we compare the observations only to atmospheres that are chemically plausible. Our 
approach is to compute the chemical equilibrium abundances and the temperature- 
pressure profiles self-consistently, while accounting for the uncertainties in the 
modelling of the methane abundance and the unknown Bond albedo through 
treating them as additional free parameters and marginalizing over them. In total, 
we perform a retrieval analysis in the five-dimensional-parameter space spanned 
by the metallicity, the cloud top pressure, the methane abundance relative to chem- 
ical equilibrium, the Bond albedo and the reference planet-to-star radius ratio. 

We introduce a free parameter for the methane abundance because the methane 
abundance has a substantial effect on the observed part of the transmission spec- 
trum, but its abundance profile cannot be predicted reliably using self-consistent 
models. The dominant source of uncertainty in our estimates for the methane 
abundance for a given metallicity is introduced by our limited knowledge of the 
vertical pressure-temperature profile. The proximity of the expected temperature 
profile to the CH,/CO transition® makes the methane abundance highly sensitive 
to the model assumptions about the vertical distribution of short-wavelength absor- 
bers, vertical energy transport and day-night heat redistribution. Depending on 
whether the temperature in the photosphere is above or below the boundary at 
which CO replaces CH, as the dominant carbon-bearing species, the methane 


abundance can vary by several orders of magnitude. Disequilibrium effects such 
as quenching and photochemistry at the upper end of the photosphere present an 
additional source of uncertainty. Our model determines the chemical composition 
of methane-reduced atmospheres by minimizing Gibb’s free energy while simul- 
taneously setting an upper limit on the methane abundance. 

The other prominent absorber in the spectral range covered by our observations 
is water. We do not introduce an additional free parameter for water, however, 
because the water abundance in the photosphere can reliably be related to the metal- 
licity of the atmosphere through chemical equilibrium calculations. Disequilibrium 
chemistry models for this planet® indicate that quenching and photochemical 
effects affect the water abundance only at pressures less than about 1 bar, whereas 
our data are primarily sensitive to higher pressures. We include the Bond albedo as 
a free parameter to account for the uncertainty introduced by the effect of the 
albedo on the temperature-pressure profile and therefore on the atmospheric scale 
height. When calculating the significance with which the solar-metallicity, cloud- 
free model is excluded we use the following definition™: 


Hobs =v. 

V2v 
where v is the number of degrees of freedom in the fit. We calculate a value of 480 
from our final fits. 

We also consider scenarios with either subsolar (C/O = 0.3) or super-solar 
(C/O = 1.0) C to O ratios. We present the results from these retrievals in Extended 
Data Fig. 9. The oxygen-rich subsolar C to O case produces results that are virtually 
identical to our solar C to O analysis in Fig. 3. For the carbon-rich supersolar C toO 
case, the high metallicity (>1,000 X solar) cloud-free scenarios are excluded because 
they exhibit a CO absorption feature at 1.6 jum that appears to be inconsistent with 
our measured transmission spectrum. The same carbon-rich models also appear to 
allow a deeper (10 higher pressure) cloud deck for moderately metal-rich, several- 
hundred-times-solar scenarios. However, carbon-rich models with C/O > 0.8 do 
not provide a good fit to GJ 436b’s dayside emission spectrum, because increasing 
the C to O ratio tends to increase the amount of methane in the atmosphere’. 
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Extended Data Figure 1 | Representative raw image from the 29 November 2012 ut observation, showing the scanned spectrum. 
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Extended Data Figure 2 | Raw white-light photometry for the four 
individual transits. Data are vertically offset for clarity. Transits shown were 
obtained on the following dates (from top to bottom): 26 October 2012 ur, 


29 November 2012 ut, 10 December 2012 ur and 2 January 2013 ut. The 
raw fluxes are shown as filled black circles, and the best-fit solutions for the 
instrumental effects and transit light curves are shown as filled red circles. 
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Extended Data Figure 3 | White-light residuals. Data are vertically offset for | and 2 January 2013 ut. The difference between the white-light fluxes and best- 
clarity. Transit residuals shown were obtained on the following dates (from top _fit model solutions are shown as filled black circles. 
to bottom): 26 October 2012 ut, 29 November 2012 ut, 10 December 2012 uT 
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Extended Data Figure 4 | Individual transmission spectra for each of the 2 January 2013 ut (red). This plot shows the errors in the measured transit 
four visits. Transmission spectra are shown as filled circles, with colours depths, but does not include the additional systematic errors from the 
indicating the date of the observations: 26 October 2012 ur (dark blue), limb-darkening models. 
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©2014 Macmillan Publishers Limited. All rights reserved 


LETTER 


) SO 
e f ‘ 
=o. : 
ma ‘ 
£ E J 
3 1 e a” 
aS) : 7 
2 7 
é a 
> é 4 : pe 1 
= al 
= % ) 1 
sc a. 1 f : 
> }_ | 
S ott : 
Q | 7 J 
oO -t i 1 
oF | a 
4,500 5,000 5,500 6,000 


Transit centre [BJD-2450000] 


Extended Data Figure 5 | Observed minus calculated transit times using the —_ uncertainties overplotted for both. The colour of the points denotes the 


new best-fit ephemeris. The solid line denotes observed minus calculated wavelength of the observations (blue for visible, red for infrared). Transits 
equal to zero. Transit times from this paper are plotted as filled stars, and shown include all previously published observations for this planet**”**. The 
previously published observations are shown as filled circles, with 10 figure is adapted from figure 8 of our previous study’. 
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Extended Data Figure 6 | Comparison to published transit depths for with optically thick clouds at 1 mbar (grey line). As we discuss in ref. 3, the 
GJ 436b. Filled black circles show previously published transit depths****"°, apparent variations in transit depth at different epochs could plausibly be 
with 1o uncertainties overplotted. The white-light transit depths from our explained by the occultation of active regions on the surface of the star. If 
WEC3 observations are plotted as black stars. We show three models for correct, this would make broadband photometry collected at different epochs 
comparison, including a solar-metallicity cloud-free model (red line), a unreliable for the purpose of constraining the planet’s transmission spectrum. 


hydrogen-poor 1,900 X solar model (blue line), and a solar metallicity model _p.p.m., parts per million. 
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show the measured emission levels in the Catt H and K line cores from Keck __ the six most recent Spitzer transit observations (dashed line), as well as the four 
HIgh Resolution Echelle Spectrometer (HIRES) spectroscopy of GJ 436 (refs 3. HST transits presented in this paper (solid line). 
and 46); these are parameterized using the Sj index, where larger values 
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(January) lines. For comparison we show two PHOENIX stellar atmosphere in the model spectra. 
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Extended Data Table 1 | Averaged differential transit depths 


Wavelength Depth (p.p.m.) Measurement Error from limb 
error (p.p.m.) darkening (p.p.m.) 
1.136 6966 60 7 
1,155 6994 50 10 
1.174 6924 40 1? 
1,193 6872 57 12 
1,211 6968 39 17 
1.230 7046 38 22 
1.249 7036 39 20 
1.268 6967 35 22 
1.289 6989 35 24 
1.306 7043 38 Ni 
1.324 6989 38 20 
1.343 7046 42 19 
1.362 7057 aT 25 
1,361 7006 37 25 
1.400 7036 50 27 
1.419 7072 46 46 
1.438 7030 42 46 
1.456 7044 42 44 
1.475 6948 Bo 44 
1.494 7008 39 49 
1,513 7057 40 5a 
1,532 7022 44 56 
151 7018 40 46 
1.570 7010 oe Al 
1.588 6959 40 Al 
1.607 6994 44 34 
1.626 6984 44 30 
1.645 6916 59 55 
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Clouds in the atmosphere of the super-Earth 
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Recent surveys have revealed that planets intermediate in size between 
Earth and Neptune (‘super-Earths’) are among the most common 
planets in the Galaxy’ *. Atmospheric studies are the next step towards 
developing a comprehensive understanding of this new class of object*®. 
Much effort has been focused on using transmission spectroscopy to 
characterize the atmosphere of the super-Earth archetype GJ 1214b 
(refs 7-17), but previous observations did not have sufficient pre- 
cision to distinguish between two interpretations for the atmosphere. 
The planet’s atmosphere could be dominated by relatively heavy 
molecules, such as water (for example, a 100 per cent water vapour 
composition), or it could contain high-altitude clouds that obscure 
its lower layers. Here we report a measurement of the transmission 
spectrum of GJ 1214b at near-infrared wavelengths that definitively 
resolves this ambiguity. The data, obtained with the Hubble Space 
Telescope, are sufficiently precise to detect absorption features from 
a high mean-molecular-mass atmosphere. The observed spectrum, 
however, is featureless. We rule out cloud-free atmospheric models 
with compositions dominated by water, methane, carbon monoxide, 
nitrogen or carbon dioxide at greater than 5a confidence. The pla- 
net’s atmosphere must contain clouds to be consistent with the data. 

We observed 15 transits of the planet GJ 1214b with the Wide Field 
Camera 3 (WFC3) instrument on the Hubble Space Telescope (HST) 
between 27 September 2012 and 22 August 2013 Universal Time (UT). 
Each transit observation (visit) consisted of four orbits of the telescope, 
with 45-min gaps in phase coverage between target visibility periods 
due to Earth occultation. We obtained time-series spectroscopy from 
1.1 pm to 1.7 um during each observation. The data were taken in spatial 
scan mode, which slews the telescope during the exposure and moves 
the spectrum perpendicularly to the dispersion direction on the detector. 
This mode reduces the instrumental overhead time by a factor of five 
compared to staring mode observations. We achieved an integration 
efficiency of 60%-70%. We extracted the spectra and divided each 
exposure into five-pixel-wide bins, obtaining spectrophotometric time 
series in 22 channels (resolution R = A/Ad ~ 70). The typical signal-to- 
noise ratio per 88.4-s exposure per channel was 1,400. We also created 
a ‘white’ light curve summed over the entire wavelength range. Our 
analysis incorporates data from 12 of the 15 transits observed, because 
one observation was compromised by a telescope guiding error and 
two showed evidence of a starspot crossing. 

The raw transit light curves for GJ 1214b exhibit ramp-like system- 
atics comparable to those seen in previous WFC3 data’®’*"°. The ramp 
in the first orbit of each visit consistently has the largest amplitude and 
a different shape from ramps in the subsequent orbits. Following stand- 
ard procedure for HST transit light curves, we did not include data 
from the first orbit in our analysis, leaving 654 exposures. We corrected 
for systematics in the remaining three orbits using two techniques that 
have been successfully applied in prior analyses'*’*”°. The first approach 
models the systematics as an analytic function of time. The function 


includes an exponential ramp term fitted to each orbit, a linear trend 
with time for each visit, and a normalization factor. The second approach 
assumes that the morphology of the systematics is independent of wave- 
length, and models each channel with a scalar multiple of the time series 
of systematics from the white-light curve fit. We obtained consistent 
results from both methods (see Extended Data Table 1), and report 
here results from the second. See the Supplementary Information and 
Extended Data Figs 1-6 for more detail on the observations, data reduc- 
tion and systematics correction. 

We fitted the light curves in each spectroscopic channel with a transit 
model”’ to measure the transit depth as a function of wavelength; this 
constitutes the transmission spectrum. See Fig. 1 for the fitted transit light 
curves. We used the second systematics correction technique described 
above and fitted a unique planet-to-star radius ratio R,/R, and norma- 
lization C to each channel and each visit, and a unique linear limb-darkening 
parameter u to each channel. We assumed a circular orbit’ and fixed 
the inclination i to be 89.1°, the ratio of the semi-major axis to the stellar 
radius a/R, to be 15.23, the orbital period P to be 1.58040464894 days, 
and the time of central transit T. to be 2454966.52488 BJD pp. These 
are the best-fit values to the white-light curve. 

The measured transit depths in each channel are consistent over all 
transit epochs (see Extended Data Fig. 5), and we report the weighted 
average depth per channel. The resulting transmission spectrum is 
shown in Fig. 2. Our results are not significantly affected by stellar 
activity, as we discuss further in the Supplementary Information. Careful 
treatment of the limb darkening is critical to the results, but our limb- 
darkening measurements are not degenerate with the transit depth (see 
Extended Data Fig. 4) and agree with the predictions from theoretical 
models (see Extended Data Fig. 6). Our conclusions are unchanged if 
we fix the limb darkening on theoretical values. We find that a linear 
limb-darkening law is sufficient to model the data. For further descrip- 
tion of the limb-darkening treatment, see the Supplementary Information. 

The transmission spectrum we report here has the precision necessary 
to detect the spectral features of a high-mean-molecular-mass atmo- 
sphere for the first time. However, the observed spectrum is featureless. 
The data are best-fitted with a flat line, which has a reduced a of 1.0. 
We compare several models to the data that represent limiting-case 
scenarios in the range of expected atmospheric compositions'’”*. Depend- 
ing on the formation history and evolution of the planet, a high-mean- 
molecular-mass atmosphere could be dominated by water (HO), methane 
(CH4), carbon monoxide (CO), carbon dioxide (CO2) or nitrogen (N2). 
Water is expected to be the dominant absorber in the wavelength range 
of our observations, so a wide range of high-mean-molecular-mass atmo- 
spheres with trace amounts of water can be approximated by a pure 
H,0 model. The data show no evidence for water absorption. A cloud- 
free pure HO composition is ruled out at 16.10 confidence. In the case 
of a dry atmosphere, features from other absorbers such as CH4, CO 
or CO, could be visible in the transmission spectrum. Cloud-free 
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Figure 1 | Spectrophotometric data for transit observations of GJ 1214b. 
a, Normalized and systematics-corrected data (points) with best-fit transit models 
(lines), offset for clarity. The data consist of 12 transit observations and are binned 
in phase in 5-min increments. The spectroscopic light curve fit parameters are 
transit depth, a linear limb-darkening coefficient, and a normalization term to 
correct for systematics. A unique transit depth is determined for each observation 
and the measured transit depths are consistent from epoch to epoch in all channels. 
b, Binned residuals from the best-fit model light curves. The residuals are within 


atmospheres composed of these absorbers are also excluded by the 
data, at 31.10, 7.50 and 5.50 confidence, respectively. Nitrogen has 
no spectral features in the observed wavelength range, but our measure- 
ments are sensitive to a nitrogen-rich atmosphere with trace amounts 
of spectrally active molecules. For example, we can rule out a 99.9% No, 
0.1% HO atmosphere at 5.60 confidence. Of the scenarios considered 
here, a 100% CO, atmosphere is the most challenging to detect because 
CO, has the highest molecular mass and a relatively small opacity in 
the observed wavelength range. Given that the data are precise enough 
to rule out even a CO2 composition at high confidence, the most likely 
explanation for the absence of spectral features is a grey opacity source, 
suggesting that clouds are present in the atmosphere. Clouds can block 
transmission of stellar flux through the atmosphere, which truncates 
spectral features arising from below the cloud altitude”. 

To illustrate the properties of potential clouds, we perform a Bayesian 
analysis on the transmission spectrum with a code designed for spectral 
retrieval of super-Earth atmospheric compositions”. We assume a two- 
component model atmosphere of water and a solar mix of hydrogen 
and helium gas, motivated by the fact that water is the most abundant 
icy volatile for solar abundance ratios. Clouds are modelled as a grey, 
optically thick opacity source below a given altitude. See Fig. 3 for the 
retrieval results. For this model, the data constrain the cloud-top pres- 
sure to less than 10 * mbar for a mixing ratio with mean molecular 
mass equal to solar and less than 10~' mbar for a water-dominated 
composition (both at 30 confidence). At the temperatures and pressures 
expected in the atmosphere of GJ 1214b, equilibrium condensates of 
ZnS and KCl could form in the observable part of the atmosphere. 
Although these species could provide the necessary opacity, they are 
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(parts per million) 


14% of the predicted photon-limited shot noise in all spectroscopic channels. The 
median observed root-mean-square in the spectroscopic channels is 315 p.p.m., 
before binning. c, Histograms of the unbinned residuals (coloured lines) compared 
to the expected photon noise (black lines). The residuals are Gaussian, satisfying a 
Shapiro- Wilk test for normality at the « = 0.1 level in all but one channel 

(1.24 um). The median reduced 7’ value for the spectroscopic light curve fits 

is 1.02. 


predicted to form at much higher pressures (deeper than 10 mbar fora 
50 X solar metallicity model)’’, requiring that clouds be lofted high 
from their base altitude to explain the spectrum we measured. 
Alternatively, photochemistry could produce a layer of hydrocarbons 
in the upper atmosphere, analogous to the haze on Saturn’s moon 
Titan’*"*. 

The result presented here demonstrates the capability of current facil- 
ities to measure very precise spectra of exoplanets by combining many 
transit observations. This observational strategy has the potential to 
yield the atmospheric characterization of an Earth-size planet orbiting 
in the habitable zone of a small, nearby star. Transmission spectrum 
features probing five scale heights of a nitrogen-rich atmosphere on 
such a planet would have an amplitude of 30 parts per million (p.p.m.), 
which is comparable to the photon-limited measurement precision we 
obtained with the Hubble Space Telescope. However, our findings for 
the super-Earth archetype GJ 1214b, as well as emerging results for 
other exoplanets'*”°”’, suggest that clouds may exist across a wide range 
of planetary atmosphere compositions, temperatures and pressures. 
Clouds do not generally have constant opacity at all wavelengths, so 
further progress in this area can be made by obtaining high-precision 
data with broad spectral coverage. Another way forward is to focus on 
measuring exoplanet emission and reflection spectra during secondary 
eclipse, because the optical depth of clouds viewed at near-normal incid- 
ence is lower than that for the slant geometry observed during transit”. 
Fortunately, the next generation of large ground-based telescopes and 
the James Webb Space Telescope will have the capability to make such 
measurements, bringing us within reach of characterizing potentially 
habitable worlds beyond our Solar System. 
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Figure 2 | The transmission spectrum of GJ 1214b. a, Transmission 
spectrum measurements from our data (black points) and previous work (grey 
points)’""', compared to theoretical models (lines). The error bars correspond 
to 1o uncertainties. Each data set is plotted relative to its mean. Our 
measurements are consistent with past results for GJ 1214b using WFC3 

(ref. 10). Previous data rule out a cloud-free solar composition (orange line), 
but are consistent with either a high-mean-molecular-mass atmosphere 

(for example, 100% water, blue line) or a hydrogen-rich atmosphere with 
high-altitude clouds. b, Detailed view of our measured transmission spectrum 
(black points) compared to high-mean-molecular-mass models (lines). The 
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Figure 3 | Spectral retrieval results for a two-component (hydrogen/helium 
and water) model atmosphere for GJ 1214b. The colours indicate posterior 
probability density as a function of water mole fraction and cloud-top pressure. 
Black contours mark the 1¢, 2o and 3o Bayesian credible regions. Clouds 

are modelled as having a grey opacity, with transmission truncated below 

the cloud altitude. The atmospheric modelling assumes a surface gravity of 
8.48 ms * and an equilibrium temperature equal to 580 K. 


error bars are 1o uncertainties in the posterior distribution from a Markov 
chain Monte Carlo fit to the light curves (see the Supplementary Information 
for details of the fits). The coloured points correspond to the models binned at 
the resolution of the observations. The data are consistent with a featureless 
spectrum ( 7° = 21.1 for 21 degrees of freedom), but inconsistent with cloud- 
free high-mean-molecular-mass scenarios. Fits to pure water (blue line), 
methane (green line), carbon monoxide (not shown), and carbon dioxide 
(red line) models have ¢ = 334.7, 1067.0, 110.0 and 75.4 with 21 degrees of 
freedom, and are ruled out at 16.10, 31.10, 7.50 and 5.50 confidence, 
respectively. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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Extended Data Figure 1 | An example of a spatially scanned raw data frame. The exposure time was 88.4. 
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Extended Data Figure 2 | An example of an extracted spectrum for an 88.4-s exposure. The dotted lines indicate the wavelength range over which we measure 
the transmission spectrum. 
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Extended Data Figure 3 | The broadband light curve fit from the first the best-fit model (line). c, Residuals from the broadband light curve fit. d, The 
transit observation. a, The raw broadband light curve. b, The broadband light —_ vector of systematics Z (see the Supplementary Information) used in the 
curve corrected for systematics using the model-ramp technique (points) and _divide-white technique. 
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Extended Data Figure 4 | The posterior distributions for the divide-white fit _ represent pairs of parameters, with lines indicating the 1¢, 2o and 30 
parameters for the 1.40-1m channel from the first transit observation. The confidence intervals for the distribution. The normalization constant is divided 
histograms represent the Markov chains for each parameter. The contour plots __ by its mean. 
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Extended Data Figure 5 | Transit depths relative to the mean in 22 spectroscopic channels, for the 12 transits analysed. The black error bars indicate the 1o 
uncertainties determined by a Markov chain Monte Carlo fit. 
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Extended Data Figure 6 | Fitted limb-darkening coefficients as a function _ confidence intervals from a Markov chain Monte Carlo fit. The temperature of 
of wavelength (black points) and theoretical predictions for stellar GJ 1214 is estimated to be 3,250 K (ref. 22). 
atmospheres with a range of temperatures (lines). The uncertainties are 1o 
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Extended Data Table 1 | Derived parameters for the divide-white (d-w) and model-ramp (m-r) techniques 


Wavelength (tum) Transit depth (p.p.m.) Limb darkening Xp 

d-w m-r d-w m-r d-w m-r 
1.135 — 1.158 -39 + 31 6+ 33 0.27 + 0.01 0.28 + 0.01 1.12 1.20 
1.158 — 1.181 -28 + 30 12+ 32 0.26 + 0.01 0.27 + 0.01 1.01 1.24 
1.181 — 1.204 34 + 30 29 + 30 0.25 + 0.01 0.26 + 0.01 1.04 1.44 
1.205 — 1.228 -48 + 28 -32 + 29 0.26 + 0.01 0.28 + 0.01 0.90 1.22 
1.228 -— 1.251 27 + 28 25 + 29 0.26 + 0.01 0.28 + 0.01 0.85 1.29 
1.251 -— 1.274 5+ 27 -6 + 29 0.26 + 0.01 0.26 + 0.01 0.97 1.29 
1.274 - 1.297 13+ 27 12+27 0.23 + 0.01 0.23 + 0.01 1.00 1.50 
1.297 — 1.320 14+ 26 0+27 0.23 + 0.01 0.25 + 0.01 0.96 1.38 
1.320 — 1.343 29 + 26 2+28 0.26 + 0.01 0.27 + 0.01 1.08 1.52 
1.343 - 1.366 2427 -154+28 0304001 032+0.01 099 1.44 
1.366 — 1.389 32 + 27 35+26 0.28+001 0.294001 097 1.42 
1.389 — 1.412 31+ 27 33 + 28 0.28 + 0.01 0.29 + 0.01 0.96 1.39 
1.412 - 1.435 5 +27 -33 + 28 0.29 + 0.01 0.31 + 0.01 1.15 1.51 
1.435 — 1.458 29+ 29 17+ 28 0.29 + 0.01 0.30 + 0.01 1.01 1.39 
1.458 — 1.481 -8 + 28 1429 0324001 0.33+0.01 1.01 1.33 
1.481 — 1.504 27 + 28 28 + 28 0.28 + 0.01 0.29 + 0.01 0.94 1.37 
1.504 — 1.527 -114+ 28 -23 + 29 0.27 + 0.01 0.29 + 0.01 1.15 1.58 
1.527 — 1.550 20 + 28 1429 0.27 + 0.01 0.29 + 0.01 1.17 1.56 
1.550 — 1.573 -214+ 28 0+ 28 0.28 + 0.01 0.29 + 0.01 1.20 1.62 
1.573 — 1.596 -65+28 -62+30 0.26+0.01 0.28+0.01 1.08 1.46 
1.596 - 1.619 -17+ 28 +29 0.26+0.01 0.274 0.01 1.34 1.69 
1.619 — 1.642 -17+30 -26 + 30 0.22 + 0.01 0.24 + 0.01 1.16 1.59 
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DNA-mediated nanoparticle crystallization into 


Wulff polyhedra 


Evelyn Auyeung’?, Ting I. N. G. Li'?, Andrew J. Senesi*, Abrin L. Schmucker””, Bridget C. Pals”, Monica Olvera de la Cruz)? 


& Chad A. Mirkin'?? 


Crystallization is a fundamental and ubiquitous process much 
studied over the centuries. But although the crystallization of 
atoms is fairly well understood”, it remains challenging to predict 
reliably the outcome of molecular crystallization processes that are 
complicated by various molecular interactions and solvent involve- 
ment. This difficulty also applies to nanoparticles: high-quality 
three-dimensional crystals*° are mostly produced using drying 
and sedimentation techniques that are often impossible to ration- 
alize and control to give a desired crystal symmetry, lattice spacing 
and habit (crystal shape). In principle, DNA-mediated assembly of 
nanoparticles offers an ideal opportunity for studying nanoparticle 
crystallization’ "’: a well-defined set of rules have been developed to 
target desired lattice symmetries and lattice constants*”"*, and the 
occurrence of features such as grain boundaries and twinning in 
DNA superlattices and traditional crystals comprised of molecular or 
atomic building blocks suggests that similar principles govern their 
crystallization. But the presence of charged biomolecules, interpar- 
ticle spacings of tens of nanometres, and the realization so far of only 
polycrystalline DNA-interconnected nanoparticle superlattices, all 
suggest that DNA-guided crystallization may differ from traditional 
crystal growth. Here we show that very slow cooling, over several 
days, of solutions of complementary-DNA-modified nanoparticles 
through the melting temperature of the system gives the thermody- 
namic product with a specific and uniform crystal habit. We find that 
our nanoparticle assemblies have the Wulff equilibrium crystal struc- 
ture that is predicted from theoretical considerations and molecular 
dynamics simulations, thus establishing that DNA hybridization can 
direct nanoparticle assembly along a pathway that mimics atomic 
crystallization. 

The crystallization of nanoparticles mediated by DNA typically involves 
initial assembly of a disordered aggregate, which upon thermal annealing 
slightly below its melting temperature transforms into an ordered super- 
lattice (Fig. 1a, blue arrows)’”. Transmission electron microscopy (TEM) 
images and the presence of rings in the small-angle X-ray scattering 
(SAXS) data show that all superlattices formed using this approach 
thus far are polycrystalline, with ordered micrometre-sized domains 
randomly oriented with respect to one another”’’. Considering that 
traditional crystallization techniques for atoms and molecules typically 
rely on slow cooling through the melting temperature”’, we hypothe- 
sized that such a slow cooling approach applied to DNA-based assembly 
strategies might yield faceted crystals. DNA-functionalized nanopar- 
ticle solutions were therefore heated to above the melting temperature 
of the DNA links designed to connect particles and then slowly cooled 
to room temperature (Fig. la, red arrows), in a process which typically 
took two to three days to complete. A key parameter known to the 
researcher before doing the experiments is the aggregate melting tem- 
perature, which is well defined and directly correlated with the nucleic 
acid sequences used for assembly’’. 

The slow cooling of the combination of two sets of gold nanoparti- 
cles functionalized with complementary DNA linker strands’ produces 


superlattices with the expected body-centred cubic (b.c.c.) packing 
when using 20-nm gold nanoparticles and CsCl packing when using 
20-nm and 15-nm gold nanoparticles, as confirmed by the radially 
averaged one-dimensional SAXS data (Fig. 1b, (i) and (ii)). To enable 
direct visualization, the superlattices were also stabilized by embedding 
in silica’’. TEM images of these structures reveal uniform crystals with 
square- and hexagonal-shaped domains for both the b.c.c. (Fig. Ic, (i) 
and (ii)) and the CsCl (Fig. 1c, (iii) and (iv)) particle packing symmet- 
ries, whereas scanning electron microscopy (SEM) allows us to observe 
surface features and the overall crystal habit (Fig. 1d). Evidently, the 
slow-cooling process enables DNA-driven assembly and crystalliza- 
tion that favours faceted rhombic dodecahedron microcrystals over 
the polycrystalline assemblies obtained by annealing below the melting 
temperature. Although single-crystal formation by annealing below 
the melting temperature may in principle be possible, the kinetics of 
reorganization from an irregularly shaped crystal into a well-defined 
microcrystal are likely to be too slow to be observed experimentally. 

SEM images of the microcrystals in different orientations on the 
substrate are all consistent with rhombic dodecahedron formation 
(Fig. 2a). Closer inspection of one of the crystals (Fig. 2a, bottom) reveals 
extraordinarily well ordered nanoparticles at the surface, as well as 
the presence of common surface defects including ‘particle adatoms’ 
(a surface defect in which an atom is adsorbed on the surface ofa crystal 
plane) and step edges. The nanoparticle orientation is consistent with 
a crystal that is enclosed by (110) planes, as expected for rhombic 
dodecahedra, and which is also the closest-packed plane in a b.c.c. unit 
cell. A tilting experiment was conducted in the TEM ona single micro- 
crystal to observe the different morphologies that are consistent with 
a rhombic dodecahedron crystal habit. It is important to note that 
although these microcrystals exhibited a wide size distribution (Fig. 2c), 
faceted crystals were the predominant product of slow cooling, and 
no shapes other than rhombic dodecahedron microcrystals were 
observed. 

In contrast to several prior examples of microcrystals grown from 
nanoparticle building blocks, in which the overall crystal shape was 
largely dependent on factors including nanoparticle size and length of 
the ligand shell’*”’, the shape of the microcrystals we report here was 
fairly independent of such parameters: microcrystals made from 5-nm, 
10-nm and 20-nm gold nanoparticles all exhibit overall rhombic 
dodecahedron shapes and b.c.c. packing with lattice parameters of 
25.7nm, 29.1nm and 39.5 nm, respectively (compare Fig. 3a, b and 
d). Oligonucleotide length was kept constant in these experiments. 
Furthermore, rhombic dodecahedron microcrystals were observed 
for a binary system consisting of 20-nm and 15-nm gold nanoparticles 
arranged in a CsCl lattice symmetry (Fig. 3c). Thus, we conclude that 
the rhombic dodecahedron is the thermodynamically most favourable 
crystal shape for this system over a range of particle sizes and inter- 
particle distances. Furthermore, molecular dynamics simulations on a 
colloid model predicted a rhombic dodecahedron equilibrium crystal 
structure, fully consistent with experimental observations (Fig. 3e). 
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Figure 1 | Superlattices formed by the slow-cooling method. a, Two 
approaches for DNA-mediated nanoparticle crystallization are designated by 
arrows on a thermal melting curve of a DNA-linked gold nanoparticle 
aggregate. The traditional method (blue arrow), in which the aggregate is 
annealed a few degrees below the melting temperature, produces 
polycrystalline superlattices with no defined shape. The slow-cooling method 
(red arrow), in which the aggregate is heated above its melting temperature and 
cooled at a rate of 0.01 °C min” ', produced well-defined, faceted microcrystals 
in each of the dozens of experiments conducted using these conditions. 
Extinction wavelength measured from the ultraviolet—visible spectrum, 520 nm 
(a.u.). b, Representative one-dimensional (top) and two-dimensional (bottom) 


The formation of rhombic dodecahedron microcrystals from a 
b.c.c. packing of nanoparticles can be rationalized in terms of the 
surface energy y of the exposed facets’. Rhombic dodecahedra are 
enclosed by (110) facets (bottom panel of Fig. 2a), which is the closest- 
packed plane for a b.c.c. or CsCl lattice. When using the standard 
broken-bond model approximation for surface energy, exposing the 
closest-packed plane is thermodynamically favoured: it requires break- 
ing the smallest number of particle-to-particle interactions per unit 
area and thus exposes the lowest-surface-energy facet”. From this 
model, the relative surface energies for b.c.c. metal facets should 
exhibit a ratio of ¥(110)"¥(111)*Ya0o) = 1:1.22:1.41. Similarly, the relative 
surface energies for face-centred cubic (f.c.c.) metal facets should be 
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Vacancy 
‘Adatom’ 


SAXS data for b.c.c. (i) and CsCl (ii) superlattices synthesized from the slow- 
cooling technique. In the one-dimensional data, the red trace is the 
experimentally obtained scattering pattern and the black trace is the predicted 
scattering pattern for a perfect lattice. SAXS data are shown as plots of 
superlattice structure factor (S(q), in arbitrary units) versus scattering vector (q, 
in units of A~!).c, TEM images of shape-controlled b.c.c. (i)-(ii) and CsCl (iii)- 
(iv) microcrystals. Scale bars are 1.5 tm for (i), 1.0 im for (ii), 0.5 um for (iii) 
and 0.5 jm for (iv). d, SEM image of a representative b.c.c. microcrystal with 
visible faceting where constituent nanoparticles can readily be seen (20-nm 
gold nanoparticles shown; scale bar is 1 tum). The inset shows a high- 
magnification (52,500) view of the crystal facet with labelled surface defects. 


¥(111)°Ya00):Y(110) = 1:1.15:1.22. These calculations thus predict the 
Wulff polyhedron, the equilibrium crystal structure, to be a rhombic 
dodecahedron enclosed by (110) facets for a b.c.c. metal and a trun- 
cated octahedron enclosed by (111) and (100) facets for a f.c.c. metal’. 

In many systems the expected Wulff polyhedron is not always 
formed, and the validity of the assumptions and approximations made 
must be analysed for each individual case. We therefore calculated 
actual surface energy values for our DNA-nanoparticle system using 
recently developed molecular dynamics simulations that accurately 
predict the crystallization behaviour of DNA-assembled nanoparticles 
(see Supplementary Information)**”. In these calculations, the surface 
energy is defined as the excess energy at the surface of a material 
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Figure 3 | Rhombic dodecahedron microcrystals with varying unit cell 
compositions. a-c, TEM and SEM images of rhombic dodecahedron 
microcrystals synthesized from a b.c.c. lattice of 5-nm gold nanoparticles (scale 
bars, left to right, are 1 jum, 1 jim and 2 tm) (a), from a b.c.c. lattice of 10-nm 
gold nanoparticles (scale bars, left to right, are 1 ym, 2 um and 4 um) (b) and 
from a CsC] lattice of 20-nm and 15-nm gold nanoparticles (scale bars, left to 
right, are 0.5 um, 1 jim and 2 tm) (c). d, SAXS data for a b.c.c. crystal made 


LETTER 


Figure 2 | Structural 
determination and electron 
microscope observation of 20-nm 
gold nanoparticle microcrystals. 

a, SEM images of rhombic 
dodecahedron microcrystals viewed 
from various orientations. A 
schematic representation of each 
crystal orientation is shown on the 
top right corner of each of the top 
four images (scale bars, clockwise 
starting from top left, are 2 um, 1 jum, 
1 um and 2 um). At the bottom, a 
close-up view of the region enclosed 
by the orange box of one of the 
crystals reveals a nanoparticle 
orientation consistent with the (110) 
facet (scale bar is 1 tm). b, A TEM 
tilting experiment on a single 
microcrystal showing the square- 
and hexagonal-type shapes of the 
crystal when viewed at different 
angles in transmission mode. All 
scale bars are 2 um. c, A TEM image 
showing the size variation and shape 
uniformity of the rhombic 
dodecahedron microcrystals (both 
scale bars are 5 [1m). 
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from 5-nm (black trace), 10-nm (red trace), and 20-nm (blue trace) gold 
nanoparticles. First-order scattering peak qo and corresponding lattice 
parameter values are indicated next to the respective scattering pattern for each 
crystal. e, Molecular dynamics simulation of a binary set of particles exhibiting 
interactions modelled for the DNA-gold nanoparticle system produces a 
rhombic dodecahedron microcrystal that is consistent with experimental 
observations. 
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Figure 4 | Method for calculating surface energy values for a b.c.c. DNA- 
gold nanoparticle superlattice. Periodic boundary conditions along the z-axis 
are removed from the bulk crystal to expose the surface of interest, for example, 


compared to the energy of the bulk system. To calculate y, periodic 
boundary conditions were removed from the modelled bulk crystal 
along the z-axis to expose the facet of interest on two sides (Fig. 4). The 
energy of the bulk crystal E,w, was subtracted from the energy of 
the exposed facet Esurface exposed? and then divided by twice the area 
(because two surfaces were exposed and the surface charge density is 
close to zero*®) to give the surface energy y of the exposed facet. 
Table 1 summarizes absolute surface-energy values for the facets ofa 
b.c.c. crystal and a fic.c. crystal consisting of DNA-assembled nano- 
particles calculated using this model, with the binding strength of com- 
plementary sticky ends scaled to 42.3 kJ mol ' (ref. 27) to match the 
strength of the DNA sequences (TTCCTT) used in our experiments. 
For the b.c.c. system, the calculated ratios of ¥(100)Y(110) = 1.46 + 0.02 
and ¥(111):¥110) = 1.24 + 0.02 are in good agreement with the theoretically 
predicted ratio described above. Evidently, the observation of uniform 
rhombic dodecahedron crystals from a b.c.c. arrangement of nanopar- 
ticles follows the crystallization behaviour expected for a b.c.c. arrange- 
ment of atoms”. The expected Wulff polyhedron was observed for the 
b.c.c. nanoparticle system, but no truncated octahedra or other uniform 
shapes were observed in either experiment or simulation among the 
faceted crystals obtained for the f.c.c. system. This is probably because 
the surface energies of the two most stable surfaces in a f.c.c. crystal are 
too close in energy (predicted and calculated ratio ¥(100)?¥(111) = 1.15) 
for one to be favoured predominantly over the other (Table 1). Further- 
more, the SAXS data for the f.c.c. crystals showed evidence of stacking 
faults and twinning in the lattice structure, defects which may have 
prevented the formation of uniform crystal shapes (a more in-depth 
discussion of the f.c.c. system can be found in the Supplementary Infor- 
mation). Nonetheless, the consistency between the experimental obser- 
vations and the simulation results provides convincing evidence that 
the broken-bond approximation used for describing surface energy 


Two surfaces exposed 


(110) 


(100) 


(110) or (100) as shown. Absolute surface energy values calculated from this 
approach are found in Table 1. y = (Egurface exposed ~ Eputx)/(2 X area). 


and crystal growth for atomic systems can similarly be used to describe 
the crystallization of nanoparticles using DNA interactions; and, 
hence, that the DNA-guided assembly of nanoparticles provides a 
nanometre-scale analogue to the crystallization behaviour exhibited 
by atomic crystals. 

The experimental observation of the Wulff equilibrium crystal struc- 
ture, coupled with computational models, demonstrates the utility of 
DNA for controlling not only the recognition properties and surface 
energy of individual nanoparticles, but also the surface energies of the 
macroscopic nanoparticle assembly in such way that a specific struc- 
ture can be deliberately programmed and realized in the laboratory. 
The challenge now for both the experimental and theoretical com- 
munities is to build on the principles we have described here to identify 
and synthesize crystal habits that maximize surface-energy differences, 
and to create single microcrystals with useful properties that may find 
practical use such as in photonic and catalytic applications. 


METHODS SUMMARY 


All oligonucleotides used in this work were synthesized on a solid-support MM48 
synthesizer using reagents purchased from Glen Research. Sequences can be found 
in the Supplementary Information. Nanoparticles were functionalized and assembled 
according to published literature protocols. After particle assembly, slow cooling to 
room temperature was conducted in a temperature cycler (Life Technologies) at a 
starting temperature above the aggregate melting temperature and typically at a 
rate of 0.01 °C min * unless otherwise specified. Superlattices were characterized 
by synchrotron SAXS experiments conducted at the Advanced Photon Source at 
Argonne National Laboratory. Superlattices were transferred to the solid state 
using a silica embedding method”” for visualization by TEM (Hitachi HD2300) 
and SEM (Hitachi SU8030). To reproduce the shapes with molecular dynamics 
simulations, a colloidal model was validated by computing the interaction poten- 
tial with simulations**”° with explicit DNA chains and simulations were performed 
with the LAMMPS package (available at http://lammps.sandia.gov/). See the 


Table 1 | Surface energy values calculated for DNA-gold nanoparticle superlattices 


Structure System Relaxed first- Lattice constant (nm) (100) (mJ m~?) (110) (mJ m7) (111) (mJ m7?) Ratios of surface energies 
neighbour 
distance (nm) 

Body-centred 20-nm gold nanoparticle, 34.0 39.3 (simulation) 0.548+0.005 0.375+0.005 0.464 + 0.003 (110):(100):(111) = 

cubic 150 strands, 18 bp 39.5 (experiment) 1: 1.46 + 0.02: 
1.24+0.02 

Face-centred 20-nm gold nanoparticle, 47.8 67.6 (simulation) 0.094+0.006 0.104+0.002 0.082 + 0.005 (111):(100):(110) = 

cubic 150 strands, 43 bp 67.4 (experiment) 1: 1152010: 
1.27 +0.08 


Surface-energy values consider only the contribution of DNA hybridization. As a reference, the surface energy of water is 72 mJ m~°. bp, number of DNA base pairs. The stated values and errors indicate the average 
and associated standard deviation of approximately five independent simulation runs with random initial conditions. The relaxed lattice parameter from simulation is shown as the topmost value; the lattice 


parameter from experiment is shown as the lowermost value. 
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Supplementary Videos for a visualization of the simulated formation of micro- 
crystals from a b.c.c. and a f.c.c. system of interacting particles modelled as a single 
bead. To estimate surface-energy values, a scale-accurate coarse-grained model 
was used™ and molecular dynamics simulations were performed on the HOOMD- 
Blue package (available at http://codeblue.umich.edu/hoomd-blue). More molecular 
dynamics simulation details and assumptions can be found in the Supplementary 
Information. 
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Increasing subtropical North Pacific Ocean nitrogen 
fixation since the Little Ice Age 


Owen A. Sherwood?+, Thomas P. Guilderson>**, Fabian C. Batista’, John T. Schiff! & Matthew D. McCarthy’ 


The North Pacific subtropical gyre (NPSG) plays a major part in the 
export of carbon and other nutrients to the deep ocean’. Primary 
production in the NPSG has increased in recent decades despite a 
reduction in nutrient supply to surface waters””’. It is thought that 
this apparent paradox can be explained by a shift in plankton com- 
munity structure from mostly eukaryotes to mostly nitrogen-fixing 
prokaryotes” ~*. It remains uncertain, however, whether the plankton 
community domain shift can be linked to cyclical climate variability 
or a long-term global warming trend®. Here we analyse records of 
bulk and amino-acid-specific ‘°N/'4N isotopic ratios (5'°N) pre- 
served in the skeletons of long-lived deep-sea proteinaceous corals 
collected from the Hawaiian archipelago; these isotopic records 
serve as a proxy for the source of nitrogen-supported export pro- 
duction through time. We find that the recent increase in nitrogen 
fixation is the continuation of a much larger, centennial-scale trend. 
After a millennium of relatively minor fluctuation, 5°N decreases 
between 1850 and the present. The total shift in 5'°N of —2 per mil 
over this period is comparable to the total change in global mean 
sedimentary 5'°N across the Pleistocene-Holocene transition, but it 
is happening an order of magnitude faster®. We use a steady-state 
model and find that the isotopic mass balance between nitrate and 
nitrogen fixation implies a 17 to 27 per cent increase in nitrogen fix- 
ation over this time period. A comparison with independent records”* 
suggests that the increase in nitrogen fixation might be linked to 
Northern Hemisphere climate change since the end of the Little Ice Age. 
Recent satellite observations have shown that globally, the perma- 
nently oligotrophic subtropical gyres are expanding at a rate of 1% to 
4% per year, generally commensurate with global decreases in net primary 
productivity>’. In contrast to the global trend, primary production in 
the NPSG (Fig. 1) has actually increased in recent decades, even as 
surface waters have become more nutrient-limited** (Extended Data 
Fig. 1; Extended Data Table 1). Nitrogen (N>)-fixing prokaryotes, 
which use the inexhaustible supply of dissolved N, in surface waters, 
are a key factor in this apparent paradox’. This increase in productivity 
has been accompanied by a “domain shift’ from a dominantly eukar- 
yotic to dominantly prokaryotic plankton community’. Such ecosystem 
shifts, with their impact on the export of nutrients to the deep ocean”® 
and their sensitivity to past and future climate change, represent major 
unresolved problems in ocean and global biogeochemical cycles'*”. 
Oceanographic monitoring at station ALOHA (22° 45’ W, 158° W; 
Fig. 1) in the NPSG has suggested a new theory about the dynamic 
nature of subtropical ocean gyres on seasonal to decadal timescales. 
Although the NPSG was once considered static and largely unrespon- 
sive to climate forcing, it is now apparent that its physical and bio- 
logical oceanographic variability may be coupled to cyclical climate 
phenomena such as the North Pacific Gyre Oscillation*’*. Therefore, it 
is not clear whether recent observed changes in subtropical gyre areal 
extent or the plankton community domain shift at station ALOHA are 
the result of cyclic, or anomalous (global warming) climate forcing". It 
is thus imperative to understand these processes over longer timescales 
than are currently available from the observational record. However, 
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Figure 1 | NPSG circulation and N* distribution with sample locations. 
Contours show climatological mean dynamic height (in units of m*-s * relative 
to the 1,000-dbar level) with arrows showing direction of geostrophic flow. 
The anticyclonic NPSG lies approximately west of the 1.6m” s 7 contour, 
between 10° N and 40° N. The colour scale shows the distribution of N* 

(N* = N-16P + 2.9 pmol kg” ', where P is phosphorus) (ref. 30). Positive N* 
indicates where the oceanic N inventory is increased by N; fixation; negative N* 
indicates where nitrogen is lost to water column denitrification. The lower 
panel shows K. haumeaae sampling locations—Cross Seamount (C), Makapuu 
(M) and French Frigate Shoals (F)—and the location of oceanographic station 
ALOHA (A). 


traditional palaeo-archives from sediment cores cannot provide mean- 
ingful longer-term context, because the entire Holocene history of the 
NPSG is represented in less than 10 cm of bioturbated sediment. 

Here, we use the unique geochemical time-series data encoded in the 
skeletons of extraordinarily long-lived, deep-sea corals to reconstruct the 
first detailed biogeochemical proxy records of the NPSG. The Hawaiian 
gold coral Kulamanamana haumeaae Sinniger et al., 2013 (previously 
known as Gerardia sp.), secretes a proteinaceous skeleton synthesized 
from its primary food source: recently exported sinking particles’. 
These growth-layered, decay-resistant skeletons therefore record the 
biogeochemical signatures of exported production, in a manner ana- 
logous to ‘living sediment traps’, with annual- to decadal-scale reso- 
lution over millennial timescales'*”’. 

We have generated records of skeletal bulk 3PN (8°Nouk) from 
specimens of K. haumeaae collected from three sites in the Hawaiian 
archipelago (Fig. 1). These records all show that from around 1000 ap 
to 1850 AD, 5’ °Nyuik Showed no long-term, secular trend (Fig. 2a). Then, 
beginning around the end of the Little Ice Age (around 1850 AD), 8’ °Ngurk 
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Figure 2 | NPSG 5"°N proxy records and their relationship to climate 
change. a, Records of K. haumeaae 5'°Npurx (solid lines show the three-point 
running average, analytical error <0.15%o) and 8!°Npne (corresponding orange 
and blue circles, analytical error <1.0%o0). Abbreviations (M, C, F) as in 

Fig. 1. Percentage N>-fixation based on mixing between NO; and N,-fixation 
shown on separate axis. b, Petrel bone collagen 8>N records!”. For clarity, 
records are shown for Hawaii and Maui breeding populations only. 

Error bars are 1 s.e.m., with sample size indicated. c, Northern Hemisphere 
temperature reconstruction® (black line with 95% confidence envelope) 

and tropical Pacific sea surface temperature reconstruction’ (red line). 

d, Composite dust record from Himalayan ice cores” (green line with 95% 
confidence envelope). 


began to decrease dramatically and monotonically to levels not seen at 
any point over the mid- to late Holocene epoch’*. The timing and mag- 
nitude of the decrease in 5’ ’Npuk is similar to recently reported records 
of 5'°N preserved in the bone collagen of Hawaiian petrels (Pterodroma 
sandwichensis)"” (Fig. 2b), suggesting that the observed changes in 8'°N 
are linked across multiple trophic levels. The most recent parts of these 
records overlap with instrumental data, over which time the NPSG has 
become increasingly nutrient-limited and more favourable to N>-fixing 
diazotrophs (Extended Data Fig. 1; Extended Data Table 1). 

Values of 5’°Npuk can be difficult to interpret directly: bulk values 
represent the effects of baseline nitrogen-source signatures combined 
with subsequent alterations due to trophic transfer'®. Recently, analysis 
of the 5’°N of individual amino acids (5'°N,q) has emerged as a power- 
ful new tool with which to decouple and unambiguously examine these 
effects independently of one other'*'*””. In heterotrophic organisms, 
amino acids fall into two isotopically distinct groups'**' (Extended Data 
Fig. 2). One group, the ‘source’ amino acids (SrcAA in Table 1), remain 
little changed through successive trophic levels of a food web. In con- 
trast, ‘trophic’ amino acids (TrAA in Table 1) become significantly 
enriched in '°N with each successive trophic transfer. 

To distinguish between the effects of baseline nitrogen-source sig- 
natures from subsequent trophic alterations, we measured 5'°N,, on a 
subset of the samples spanning the bulk records. The values for individual 
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Table 1 | Correlation of 57°N,, parameters with 57°NyuiK 


Parameter Slope Standarderror P value 
SrcAA Gly 0.99 0.31 0.36 0.005 
Ser 1.10 0.34 0.36 0.005 
Lys 0.88 0.19 0.56 <0.001 
Tyr 0.76 0.31 0.27 0.026 
Phe 1.06 0.13 0.80 <0.001 
Average SrcAA 1.00 0.13 0.78 <0.001 
TrAA Glx 0.92 0.43 0.20 0.049 
Asx 0.76 0.26 0.32 0.009 
Ala 0.97 0.42 0.23 0.034 
lle 1.37 0.37 0.44 0.002 
Leu 0.94 0.33 0.30 0.012 
Pro 0.80 0.29 0.29 0.014 
Val 0.51 0.32 0.13 0.126 
Average TrAA 0.88 0.23 0.46 0.001 
Trophic position (average TrAA -0.01 0.03 0.00 0.880 
minus average SrcAA) 
Trophic position (Glu minus -0.04 0.06 0.02 0.545 
Phe method) 
XV 0.04 0.12 0.01 0.724 


Boldface indicates significant correlations at the P< 0.05 level. Strong correlations are seen for 
individual source amino acids (SrcAA) and trophic amino acids (TrAA), including overall SrcAA and TrAA 
group averages. Estimates of trophic position, calculated using two different formulations, as well as 
microbial resynthesis (as indicated by the EV parameter) were uncorrelated with 6!°Ngux(n = 20). This 
indicates that trends in 81°N.u, timeseries are driven by &!°N at the base of the foodweb. Source amino 
acids are glycine (Gly), serine (Ser), lysine (Lys), tyrosine (Tyr) and phenylalanine (Phe). Trophic amino 
acids are glutamic acid + glutamine (GIx), aspartic acid + asparagine (Asx), alanine (Ala), isoleucine 
(Ile), leucine (Leu), proline (Pro) and valine (Val). The two relative trophic position formulations were: 
{[(5!®Neverage Traa—8'°Naverage Sroaa) — 3.4/7.6} + 1 (ref. 19) and {[(5!®Neiy—5!°Npne) — 3.4)/7.6} + 1 
(ref. 19). EV = 1/7ZAbs(51°Ngaq—5" °Naverage Traa) (ref. 19). 


amino acids, as well as SrcAA and TrAA group averages, were signifi- 
cantly correlated with 5'°Npure With a slope near unity (Table 1). In con- 
trast, calculated trophic position remained essentially constant, and 
had no statistical relationship with 5'°Nyun. This implies that most of 
the variability in 5'°Npux (Fig. 2a) can be attributed to changes in source 
nitrogen at the base of the foodweb. Among the SrcAA group, pheny- 
lalanine (Phe) best preserves the baseline 5'°N values, with negligible 
fractionation through trophic transfers in planktonic food webs’*”'. 
In the Makapuu corals (Fig. 1), 5'°Npne decreased from an average of 
4.1 + 0.4%0 (1 s.d., 1 = 16) during the pre-1850 period, to a low of 2.3%o 
in the most recent part of the record (Fig. 2a). This latter value is directly 
within the range of present-day thermocline NO; _ (8'°Nno3 = 1.5- 
2.4%o, n = 4) and sinking particulate nitrogen (5'°Npnsink = 2.3-3.6%o, 
n = 6) at station ALOHA” (Fig. 3), confirming that 5'°Nppe in these 
corals represents a close proxy for baseline 5'°N values (Methods). 
Further, both the 8'°N,, patterns and the analysis of the 35!°N of TrAA 
confirm that neither trophic position nor the microbial resynthesis of 
sinking particles has significantly affected the observed trends in 8)°Npuk 
(Table 1, Extended Data Fig. 3). 

On the multi-decadal timescale of the observed trends (Fig. 2a), 
exported productivity in the NSPG is supported by two isotopically 
distinct nitrogen sources: fixation of dissolved Nz (3° Nngix = —1%o), 
and upward transport of NO;— (8'°Nxo3 = 6.5%o)*??. Assuming 
that inputs of N must be balanced by exports of sinking particulate 
nitrogen, the proportional contribution of each source is reflected in 
Oo Nene . Therefore, a well-established, two-endmember mixing 
model—Fygy = 1- [(5°Npnsink — 8°Nnax)/(8!°Nno3 — 8 °Nwax)]s where 
Fygix is the fraction of N2 fixation—has been widely applied to estimate 
the contribution of N; fixation to export production at station ALOHA 
(that is, about half at present)*!”"**?5, By substituting coral 5'°N as a 
proxy for 5'°Npnsinlo the long-term change in N; fixation may be esti- 
mated. Using this simple model, the change observed in the Makapuu 
and Cross Seamount records (A8!°Npuni = AS’ ’Nphe = —2%o) indi- 
cates a 27% increase in N fixation since about 1850 AD. For French 
Frigate Shoals (A8?Nouik = —1.3%0) the same calculation indicates a 
17% increase. We note that the addition or modification of nitrogen- 
isotope endmembers could potentially affect this interpretation; however, 
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Figure 3 | Depth variations in water column 5'°N confirm K. haumeaa 
5’°’Nphe as a proxy for export production. Particulate nitrogen export at 
station ALOHA is supported by N> fixation and the upward flux of deep 
(>400 m) NO; . Remineralization of sinking particulate nitrogen contributes 
to formation of a shallow (150-300 m) pool of isotopically light (~2.5%o) 
NO, _, which turns over in less than 15 years (ref. 22). Makapuu K. haumeaae 
3°’Npne (analytical error <1.0%o) decreased from an average of 4.2%o before 
about 1850, to a low of 2.3%o in the late twentieth century (red arrow), within 
range of the shallow 5'°Nyo3 and sinking particles”. 


none of the plausible mechanisms, such as atmospheric or land-based 
sources of nitrogen or changes in eastern Pacific denitrification, can 
explain the observed trends (Methods). 

We offer two hypotheses for the long-term increase in N> fixation 
indicated by the K. haumeaae records. The first involves a sustained 
expansion of the NPSG since the end of the Little Ice Age. This idea is 
supported by current seasonal trends in the NSPG extent and N; fix- 
ation, as well as the 5’°Npu offsets between sampling locations with 
respect to the distribution of the nitrogen-to-phosphorus (P) stoichi- 
ometric imbalance parameter N* (where N* = N - 16P + 2.9 umolkg™ 4) 
(ref. 30) (Fig. 1). Currently, seasonal fluctuations in gyre areal extent 
and nutrient inventories are associated with predictable changes in 
both N, fixation’*** and the 5'°N of export production: at station 
ALOHA, the larger summertime gyre corresponds with an approxi- 
mately 1.5%o decrease in 5!’ Npnsink (refs 10, 23). Such seasonal changes 
provide a model for how N;j fixation and 5!°Npnsink might change with 
progressive gyre expansion over a longer timescale. Further, the relative 
offsets in the 3'°Ny yw, records between our three locations, as well as the 
relative gradients of change observed (Fig. 2a), are also consistent with 
an expanding gyre margin. Finally, longer-term expansion of NPSG 
on multiple timescales has also been inferred from satellite imagery”, 
models’, and direct observation records** (Extended Data Fig. 1; Extended 
Data Table 1). Although recent shifts could also be related to interannual 
to decadal natural cycles**”*, a number of centennial-scale trends, inclu- 
ding decreased precipitation”’, increased tropical Pacific sea surface 
temperature’ (Fig. 2b) and inferred shifts in the latitudinal position of 
the Pacific intertropical convergence zone after the Little Ice Age”, are 
all consistent with the idea of NPSG expansion. 

A second hypothesis involves an increase in N; fixation linked to the 
supply of iron-bearing dust aerosols'’. Hawaii is significantly affected 
by bioavailable iron dust originating from Asia, which relaxes the iron 
limitation for N>-fixing diazotrophs”. Millennial-length ice-core records 
from Himalayan glaciers document up to fourfold increases in dust 
concentrations over the past century, reflecting warming trends in 
Asia’? (Fig. 2c). Such an increase might also account for the trend towards 
higher rates of N, fixation indicated by the K. haumeaae 5'°N records. 
However, neither gyre expansion or increasing dust deposition are 
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mutually exclusive hypotheses, and we note that ultimately both are 
the result of a common driver—the Northern Hemisphere warming 
trends since the end of the Little Ice Age’. 

Overall, these data provide strong evidence for a progressive, 17%- 
27% increase in N; fixation in the NPSG since the mid-nineteenth 
century, unique with respect to the mid- to late Holocene and com- 
parable in magnitude to the total change in global mean sedimentary 
5'°N across the Pleistocene-Holocene transition’. From a considera- 
tion of the mechanisms that can most plausibly explain this shift, we 
conclude that export productivity in the NPSG has been responding to 
anomalous climate forcing over the past 150 years or so. 

Given that the K. haumeaae records are a proxy for exported pro- 
ductivity, these data indicate that the biogeochemical impacts of chang- 
ing surface N, fixation propagate at least into the mesopelagic ocean 
on centennial timescales. This signal, in both timing and amplitude, is 
propagated into multiple higher trophic levels of the marine food web 
and is observed in petrel bone bulk 5'°N data: that is, the petrel data are 
probably not reflecting declining trophic position’’, but instead changes 
at the base of the food web. These dramatic changes in the world’s largest 
contiguous biome (NPSG) highlight the important role of nitrogen fixa- 
tion in the response of marine ecosystems to long-term climate change. 


METHODS SUMMARY 


Colonies of Hawaiian gold coral (K. haumeaae) were collected from water depths 
of approximately 450m with the HURL/NOAA Pisces V submersible between 
1997 and 2004 (ref. 14 and Extended Data Fig. 4). Air-dried colonies were sec- 
tioned near the skeletal base, polished and photographed under a binocular micro- 
scope. A computerized Merchantek micromill was used to mill samples, parallel to 
growth banding, at increments of 0.1 mm, along radial transects from the outer 
edge to the centre of each section (Extended Data Fig. 5). Each sample represents 
from 1 to 20 (average 5) years of growth, depending on growth rate. Radiocarbon 
dating of sample aliquots (Extended Data Table 2) was used for age modelling 
(Extended Data Fig. 6). 

Bulk 8°N isotope ratios—defined as 8PN = [(ON/ ™!N) saint! (N/"4*N) standard — 
1] X 1,000—were measured using a Carlo Erba 1108 elemental analyser interfaced 
to a ThermoFinningan Delta Plus XP isotope ratio mass spectrometer (IRMS). 
Values are reported in per mil (%o) units relative to atmospheric Ny (8°Nair = 0%o). 
Reproducibility, as measured by the difference in sample replicates, averaged <0.15%o. 

Amino-acid-specific 5'°N,, was measured on sample composites (combining 3 
to 10 separate samples to obtain a total mass of 15-20 mg). Composites were 
hydrolysed in 100 ml of 6 N HCl for 20h, and spiked with a norleucine internal 
standard followed by formation of trifluoroacetyl/isopropyl ester derivatives’’. 
These derivatives were analysed on a Thermo Trace Ultra gas chromatograph, fitted 
with a SGE BPXS5 capillary column (60 m X 0.32 mm internal diameter, 1 um film 
thickness), in line with the oxidation and reduction furnaces, and linked to a 
ThermoFinnigan Delta Plus XP IRMS. Samples were analysed in quadruplicate. 
Analytical accuracy was monitored by analysis of the norleucine internal standard 
and co-derivatized amino-acid external standards for which authentic 5'°N values 
of each amino acid were determined offline. Reproducibility for individual amino- 
acid values was typically better than 1%v. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 

Sample collection and preparation. Colonies of Hawaiian gold coral were col- 
lected from water depths of approximately 450 m with the HURL/NOAA Pisces V 
submersible between 1997 and 2004 (ref. 14 and Extended Data Fig. 4). Tradi- 
tionally known as Gerardia sp., the Hawaiian gold coral was recently reclassified as 
a newly erected genus and species, Kulumanana haumeaae”’. All of the specimens 
were collected alive, with the exception of specimen Ger9702 from Makapuu 
which was dead when collected. Air-dried colonies were sectioned near the skeletal 
base with a rock saw. Sections 10 mm thick were mounted on 50mm X 75mm 
glass slides with epoxy, ground and polished on diamond laps, ultrasonically cleansed 
in isopropyl alcohol, and photographed under a binocular microscope. A compu- 
terized Merchantek micromill was used to mill samples, parallel to growth band- 
ing, at increments of 0.1 mm, along radial transects from the outer edge to the 
centre of each section (Extended Data Fig. 5). Each approximately 5-mg sample 
represents from 1 to 20 (average 5) years of growth, depending on growth rate. 
Radiocarbon dating and age models. Radiocarbon dating was performed on 8 to 
10 sample aliquots from each section of K. haumeaae. Radiocarbon measurements 
were performed at the Center for Accelerator Mass Spectrometry (CAMS), Lawrence 
Livermore National Laboratory. Aliquots (1 mg) were combusted in individual quartz 
tubes and reduced to graphite in the presence of iron catalyst. Results include a 
background and 8'3C correction and are reported as Fraction modern Mc (ref. 32; 
Extended Data Table 2). Radiocarbon age calibrations and age models were gene- 
rated with Clam version 2.0 (ref. 33), using the Marine09 database** with a local 
reservoir (AR) correction*® of —34 + 13 years, based on a chronology of surface 
water '*C from Hawaiian reef corals**. Post-bomb values were calibrated directly 
to the Hawaiian surface water chronology*’. Age models were fitted with a spline 
function with a smoothing level of 0.6. Models were run over 1,000 iterations, from 
which mean ages and 95% confidence levels were calculated (Extended Data Fig. 6). 
Bulk N isotope analysis. 5'°N isotope ratios—defined as 5'°N = [(1°N/ "AN) sample! 
(®N/"4N) standard - 1] X 1,000—were measured on 0.7-mg aliquots of all the milled 
samples using a Carlo Erba 1108 elemental analyser interfaced to a ThermoFinningan 
Delta Plus XP isotope ratio mass spectrometer (IRMS). Values are reported in per 
mil (%o) units relative to atmospheric N> (5!°N,ir = 0%o). Raw isotope values were 
corrected for instrument drift and linearity effects and calibrated against the in- 
house isotopic reference materials of the Stable Isotope Lab, University of California, 
Santa Cruz, using standard laboratory protocols (http://es.ucsc.edu/~silab/index.php). 
Reproducibility, as measured by the difference in sample replicates, averaged <0.15%o. 
AA hydrolysis and derivitization. Sample composites (combining 3 to 10 sepa- 
rate samples to obtain a total mass of 15-20 mg) were hydrolysed in 100 ml of 6 N 
HCl for 20h, and spiked with 6 il of norleucine internal standard (8°N = 7.9%o). 
Hydrolysates were evaporated to dryness under a stream of N, and stored in a desi- 
ccator overnight, followed by formation of trifluoroacetyl/isopropyl ester derivatives”. 
Amino-acid compositional analysis. Amino-acid molar composition was deter- 
mined with an Agilent 7890A gas chromatograph fitted with a SGE BPX-5 column 
(60 m X 0.32 mm internal diameter, 1 1m film thickness). Response factors were 
determined with a dilution series of an external amino-acid standard mixture of 14 
common protein amino acids. Reproducibility, as measured by the standard devi- 
ation of analytical replicates, averaged <5 mol.%. Arginine (Arg), cysteine (Cys) 
and histidine (His) concentrations could not be determined owing to their break- 
down during acid hydrolysis. 

5'°N amino-acid analysis. Measurement of the 5'°N of amino-acid derivatives 
was performed using previously published procedures’’. Derivatives were analysed 
on a Thermo Trace Ultra gas chromatograph, fitted with a SGE BPX5 capillary 
column (60 m X 0.32 mm internal diameter, 1 1m film thickness), in line with the 
oxidation and reduction furnaces, and linked to a ThermoFinnigan Delta Plus XP 
IRMS. Samples were analysed in quadruplicate. Analytical accuracy was moni- 
tored by analysis of the norleucine internal standard and co-derivatized amino- 
acid external standards for which authentic 5'°N values of each amino acid were 
determined offline. Reproducibility for individual amino-acid values was typically 
better than 1%o. The 5'°N of methionine (Met) was not measured owing to insuffi- 
cient concentrations. 

Oceanographic data. The dynamic height and N* data in Fig. 1 use gridded data 
from World Ocean Atlas 2009 (http://www.nodc.noaa.gov/OC5/WOA09/woa09 
data.html). Data in Extended Data Fig. 1 and Extended Data Table 1 use ocean 
station data from World Ocean Database 2009 (http://www.nodc.noaa.gov/OC5/ 
SELECT/dbsearch/dbsearch.html) for the region 17.5-27.5° N, 150-170° W. Data 
were filtered for outliers and plotted using Ocean Data View software, version 
4.3.10 (R. Schlitzer, http://odv.awi.de). 

Climate records. Temperature record in Fig. 2c is from the NHHAD_EIV 
Northern Hemisphere land and ocean multiple proxy temperature reconstruction 
of ref. 8, smoothed with a 15-year running average. The tropical Pacific sea surface 
temperature record is from the compilation of reef coral oxygen isotope records of 
ref. 7, smoothed with a 15-year running average. The dust record in Fig. 2d is from 


a compilation of the Dasuopo, Dunde and Guliya Himalayan ice cores”. Indivi- 
dual decadal records were normalized by subtracting the mean and dividing by the 
standard deviation for each record. The composite record is the average of the 
three individual records, smoothed with a 20-year running average. 

Deep-sea coral 5'°N as a proxy for sinking particulates. It has been well estab- 
lished that the skeletons of K. haumeaae and other deep-sea proteinaceous corals 
are biogeochemically tightly coupled to surface waters. The presence of bomb radio- 
carbon (that is, post-1950) signatures in the living tissues and skeletal protein of 
deep-sea corals establishes recently exported particles as their main food source”’. 
The incorporation of detailed bomb-radiocarbon chronologies within the skeletal 
growth rings, without any significant attenuation or time-lag relative to surface 
water chronologies, further establishes that euphotic zone biogeochemical signa- 
tures are transmitted to deep-sea corals efficiently and rapidly'*!°. The 5'°Npne 
signature in the Makapuu K. haumeaae specimens closely matches the 5'°N of 
both sinking particulates and thermocline NO; (Fig. 3). Data from two other 
oceanographic regions, the northwest Atlantic and the denitrification-affected 
eastern north Pacific, show that coral 8!°Nphe values closely follow the same pattern 
as NO; with respect to N* (Extended Data Fig. 7). 

Steady state isotopic mass balance model assumptions. The isotopic mass balance 
model assumes that the export flux of particulate N at station ALOHA and sur- 
rounding waters of the NSPG is balanced by N; fixation and upward flux of NO; 
(refs 4, 10, 13, 22 and 23). There is no isotopic impact from incomplete NO; 
assimilation, because NO; in the mixed layer is exhausted to <0.2 mol kg! 
year-round (Hawaii Ocean Time-series Data Organization and Graphical System 
(HOT-DOGS); see http://hahana.soest.hawaii.edu/hot/hot-dogs/interface.html). 
Additional nitrogen isotopic endmembers are negligible to the nitrogen budget. 
Atmospheric deposition of organic nitrogen accounts for <2% of the particulate 
nitrogen export at 150 m (refs 4, 23 and 30). Terrestrial runoff of fertilizers (values 
near —2%o) can depress local seawater 8!N, but the similar trends at all three 
locations, including French Frigate Shoals (Fig. 1), rules out any direct land-use 
effect from the Hawaiian islands. 

Finally, modification of Pacific midwater 5'°Nyo3 could arise from changes in 
the rates of denitrification along the eastern Pacific margin**. However, this is very 
unlikely to account for the trends in 8!°N of Hawaiian K. haumeaae (Fig. 2) for the 
following reasons. (1) Denitrification would have to decrease to account for the 
observed downward trends in K. haumeaae 5'°N, whereas in fact denitrification 
has increased in global oxygen minimum zones, including the ETNP*””. (2) The 
geostrophic circulation isolates the NPSG from a tongue of denitrification-affected 
waters, of which the northern boundary lies south of the Hawaiian archipelago at 
a latitude of 10° N (refs 30 and 41). (3) Oceanographic data from the Hawaiian 
archipelago show that oxygen levels in the upper 500 m have remained constant or 
have increased (Extended Data Fig. 1; Extended Data Table 1). This observation 
is consistent with expansion of the NPSG and inconsistent with any impact of 
denitrification from the eastern Pacific margin. 
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Extended Data Figure 1 | World Ocean Database 2009 instrumental records 
demonstrate physical and biological changes in the NPSG since 1950. 
Increasing salinity and temperature is accompanied by a strong decrease in 
silicate and phosphate nutrient concentrations, and an increase in N*. 

These data are consistent with the previously observed shift from a dominantly 


eukaryotic to dominantly prokaryotic (N>-fixation) ecosystem in the 
NPSG*?. Increasing rates of change with depth suggest large-scale changes in 
oceanographic circulation, consistent with expansion of the NPSG. Rates of 
change across separate depth bins are provided in Extended Data Table 1. 
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Extended Data Figure 2 | Distribution of amino-acid concentration and 
5!°N,, in K. haumeaae. a, Amino-acid mole per cent composition. 

b, 8'°N,, data, showing two main groups: relatively 5'°N-enriched ‘trophic’ 
amino acids (TrAA), and relatively lower-8"°N ‘source’ amino acids (SrcAA). 
These patterns are very close to those of heterotrophic fresh biomass'*?". 
Together with low XV values (Table 1), this supports the use of 3!°Nphe as a 
proxy for the 8'°N of exported production, and indicates that 8'°N values have 
not undergone any significant diagenetic alteration. Extremely low values of 
threonine (Thr) are consistent with previous observations**’, and this amino 
acid is now understood to be neither a trophic nor source amino acid*. 
Mean 8!°Npurk is shown for context. Error bars represent 1 s.d. (n = 20). 
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specimen Ger9701; open symbols, specimen Ger9702. Trophic position and 


Extended Data Figure 3 | Timeseries of 5'°N,, parameters in comparison to 
XV parameters are defined in Table 1. 


5°°’Noux from two K. haumeaae specimens from Makapuu. Closed symbols, 
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Extended Data Figure 4 | Example K. haumeaae colony photographed 
in situ. Photo credit: NOAA Hawaiian Undersea Research Laboratory, DSRV 
Pisces Pilots & Engineers, 2004. 
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Extended Data Figure 5 | Example K. haumeaae skeletal cross-section. 
Specimen Ger9701 from Makapuu. Red lines indicate micromill transect. 
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represent conventional '*C calibrated age distributions. Green-shaded areas are — models owing to age reversals. The shape of the curves reflects variable growth 
post-bomb (post-1950) age distributions. Black curves and grey-shaded regions __ rates in the four coral samples illustrated. 
represent spline fits with 95% confidence intervals, respectively. The average 
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Extended Data Figure 7 | Relationship between 5'°N and N* across 
different oceanographic regions. The 5'°Nno3 is driven by isotopic 
fractionation processes of denitrification, nitrate uptake and remineralization 
of diazotrophic organic nitrogen (ETNP, Eastern Tropical North Pacific; 
ETSP, Eastern Tropical South Pacific; redrawn from ref. 46). Existing deep-sea 
coral 5'°Npne data (red points) closely follow the same overall pattern. 

Data points are: (1) Hawaii K. haumeaae (2 specimens, 20 measurements; 


this paper); (2) Northwest Atlantic Primnoa resedaeformis (2 specimens, 

8 measurements’’); (3) Monterey Bay Isidella sp. (1 specimen, 

10 measurements”); Error bars show the total range of measurements. 
Corresponding values of N* are obtained from the nearest surface water grid 


point of the World Ocean Atlas 2009 (http://www.nodc.noaa.gov/OC5/ 
WOA09/woa09data.html). 
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Extended Data Table 1 | World Ocean Database 2009 physical and chemical parameters 


0-150 m 151-300 m 301-500 m 
Parameter t Std.Error P value t Std.Error — P value Slope + Std.Error — P value 
Temperature (°C) +0.004 + 0.007 0.566 
Salinity (p.s.u.) +0.003 + 0.001 0.012 
Sigma-theta (kg m3) +0.001 + 0.003 0.656 


Oxygen (ml per litre) —0.001 + 0.001 0.342 
Silicate (tunol per litre) —0.074 + 0.023 0.003 
Phosphate (mol per litre) —0.004 + 0.001 <0.001 
Nitrate (Lumol per litre) —0.004 + 0.002 0.071 

N* 0.035+0.011 0.004 
Chlorophyll (ug per litre) +0.002 + 0.001 0.014 +0.001 = 


Changes in the parameters in the NPSG are shown from 1950 to 2010, across multiple depth bins. Boldface P values indicate statistically significant trends at the P < 0.05 level. Sigma-0 is the density of sea water 
at a reference level of O dbar (that is, ata depth of Om). NA, not available. 
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Extended Data Table 2 | K. haumeaae radiocarbon data 


Distance 530 Fraction 
(mm) modern 


Ger9701 Makapuu 156055 1.1138 0.0042 4.2 >Modern 
156056 0.9482 0.0029 2.9 425 
156057 0.9339 0.0035 3.5 550 
156058 0.9321 0.0028 2.8 565 
156059 0.9147 0.0046 4.6 
156060 0.8884 0.0028 2.8 950 
156061 0.8816 0.0029 2.9 
156062 0.8684 0.0029 2.9 
Ger9702 Makapuu 153722 0.9414 0.0039 3.9 485 
153723 0.9334 0.0034 3.4 555 
56064 0.9170 0.0030 3.0 695 
153724 0.9204 0.0030 3.0 665 
56065 0.9006 0.0026 2.6 840 
153725 0.8780 0.0037 3.7 1045 
56066 0.8711 0.0029 2.9 1110 
153726 0.8583 0.0037 3.7 1225 
Cross Seamount 57079 1.0736 0.0044 44 >Modern 
57080 1.0855 0.0038 3.8 >Modern 
57081 0.9714 0.0028 2.8 >Modern 
57082 0.9592 0.0028 2.8 335 
57083 0.9419 0.0027 2.7 480 
57084 0.9347 0.0027 2.7 540 
57085 0.9186 0.0027 2.7 680 
57086 0.9264 0.0038 3.8 
57087 0.9073 0.0024 2.4 780 
57088 0.8956 0.0026 2.6 885 
PV694 French Frigate Shoals 59298 0.9699 0.0031 3.1 
59299 0.9441 0.0030 3.0 460 
159300 0.9186 0.0028 2.8 680 
59301 0.9082 0.0030 3.0 775 
159302 0.8884 0.0030 3.0 950 
59303 0.8941 0.0029 2.9 900 
159304 0.8779 0.0037 3.7 
159305 0.8579 0.0039 3.9 
159433 0.8326 0.0027 2.7 


pc Error “4c Age Error 


Location CAMS ID Sample 


Note that specimen Ger9702 was collected dead; the other three specimens were collected alive. CAMS, Center for Accelerator Mass Spectrometry. 
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Low investment in sexual reproduction threatens 
plants adapted to phosphorus limitation 


Yuki Fujita’, Harry Olde Venterink?, Peter M. van Bodegom’, Jacob C. Douma‘, Gerrit W. Heil®, Norbert Hélzel°, Ewa Jablonska’, 
Wiktor Kotowski’, Tomasz Okruszko®, Pawet Pawlikowski’, Peter C. de Ruiter? & Martin J. Wassen!° 


doi:10.1038/nature12733 


Plant species diversity in Eurasian wetlands and grasslands depends which are found in P-limited sites. The explanation of why these spe- 
not only on productivity but also on the relative availability of nutri- cies are associated with P-limited sites may lie in their functional traits. 
ents, particularly of nitrogen and phosphorus’ *. Here we show that In general, fast-growing species dominate in nutrient-rich environments, 
the impacts of nitrogen:phosphorus stoichiometry on plant species whereas slow-growing species dominate in nutrient-poor conditions”””. 
richness can be explained by selected plant life-history traits, notably | With respect to the N:P stoichiometry, a number of studies indicate that 
by plant investments in growth versus reproduction. In 599 Eurasian low N:P-ratio environments favour fast-growing species with long roots, 
sites with herbaceous vegetation we examined the relationship between _ or species that fix N, whereas high N:P-ratio environments favour slow- 
the local nutrient conditions and community-mean life-history traits. growing species with specialized P uptake traits; for example, cluster 
We found that compared with plants in nitrogen-limited commu- ___ roots, arbuscular mycorrhizae or high phosphatase activity*’*"°. The 
nities, plants in phosphorus-limited communities invest little in association between fast-growing species and low N:P ratios is also 
sexual reproduction (for example, less investment in seed, shorter consistent with the growth rate hypothesis’’, which states that a fast 
flowering period, longer lifespan) and have conservative leafeconomy _ growth rate is enabled by high investment in P-rich RNA, resulting in 
traits (that is, a low specific leaf area and a high leaf dry-matter con- _ relatively high leaf P concentrations and concomitant low N:P ratios. 
tent). Endangered species were more frequent in phosphorus-limited | However, these particular traits do not necessarily explain differences 
ecosystems and they too invested little in sexual reproduction. The _ in total species richness along N:P availability gradients and, more- 
results provide new insight into how plant adaptations to nutrient —_ over, it seems possible that the selection for these traits may depend on 
conditions can drive the distribution of plant species in natural eco- environmental conditions other than the relative availability of N and 
systems and can account for the vulnerability of endangered species. _P. We looked for an explanation based on inherent plant life-history 

Species diversity is influenced both by overall nutrient availability and _ traits, particularly investments in growth versus reproduction. We had 
by nutrient stoichiometry—that is, how the ratio of available nutrients access to a large comparative data set, which enabled us to separate the 
relates to their consumers’ requirements”. In terrestrial plant commu- _ effects of overall nutrient availability from those of N:P stoichiometry 
nities the two nutrients that most frequently limit growth are nitrogen across many species. In addition, by linking our traits analysis to the 
(N) and phosphorus (P)°*. It has long been recognized that in sites that | Red List statuses of the species involved (Red Lists of seven different 
are N- or P-limited, the species assembly is different*®; this difference countries), we were able to assess the mechanisms that might account 
is also reflected in the occurrence of endangered species'’*”*, most of _ for why certain species are more vulnerable for extinction than others. 
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Figure 1 | Relationship between biodiversity indices of vascular plants and __ values of N:P ratio regressed by productivity (see Supplementary Discussion 1 
N:P ratio corrected for productivity effects. a—c, Tested biodiversity indices and Extended Data Fig. 1). tth quantile regression functions (t = 0.50, 0.75, 
are the number of species (a), the number of endangered species (b), and 0.90, 0.95) are also shown. See Extended Data Fig. 2 for the 95% confidence 
the percentage of endangered species (c) (number of sites (n) = 539). N:P ratio _ intervals of the regression coefficients. The results were not biased by the 
was corrected for the confounding effects of productivity by using the residual _ selection of habitat types in our data set (Extended Data Fig. 3). 
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Figure 2 | Relationship between N:P ratios and community-mean trait 
values of herbaceous vascular plant species, after removing confounding 
effects of productivity. The residual values of N:P ratio (x axis) and 
community-mean traits (y axis; Extended Data Fig. 4) regressed by productivity 
were used (see Supplementary Discussion 1 and Extended Data Fig. 1). 

a-o, The 15 traits tested are canopy height (a, number of sites (n) = 530), leaf 
mass (b, n = 525), specific leaf area (c, n = 529), leaf dry matter content 

(d, n = 525), seed mass (e, n = 533), seed number per shoot (f, n = 524), seed 
investment (g, n = 523), month flowering started (h, n = 528), flowering 
period (i, n = 528), lateral spread (j, n = 526), reproduction by seeds 

(k, n = 528), vegetative reproduction (1, n = 528), lifespan (m, n = 531), plant 
architecture (n, n = 533), and N fixation (0, n = 502). Extended Data Table 1 


The data set consisted of 599 field plots in herbaceous ecosystems in 
Eurasia, with data on plant species composition, aboveground produc- 
tivity of vascular plants as a proxy for overall nutrient availability, and 
the N:P:potassium (K) ratio in the aboveground plant biomass as a proxy 
for nutrient stoichiometry (See Supplementary Discussion 1). Sites con- 
sidered to be K-limited (n = 60, based on N:K and K:P ratios!) were 
excluded from further analysis. Across the remaining part of the data 
set (all N- and/or P- (co-)limited, n = 539), species richness of vascular 
plants is highest at intermediate N:P ratios (N and P co-limitation; 
Fig. 1a), whereas the numbers of endangered species tend to be highest 
at higher N:P ratios (Fig. 1b) and the percentage of endangered species 


gives abbreviations and trait units. Linear regression models are shown, plus 
their standardized coefficient values (3) and two-tailed P values of the 
coefficients (***P < 0.001; NS, not significant). Bar graphs show fraction 

of variance (for continuous traits) or deviance (for binary traits) of 
community-mean trait values explained by productivity and N:P ratio. 
Variances in community-mean trait values are separated into unique effects 
of productivity (green) and N:P ratio (purple), and shared effects of 
productivity and N:P ratio (grey). Negative shared effects indicate that the trait 
is suppressed by interaction between productivity and N:P ratio. For these 
traits, total variance explained (dotted lines) is smaller than the sum of unique 
effects of N:P ratio and productivity. 


rises as N:P ratios increase towards P limitation (Fig. 1c). For 446 her- 
baceous plant species occurring in the 539 plots, we retrieved 15 life- 
history plant traits related to nutrient acquisition, growth and reproductive 
strategy (see Extended Data Table 1). After accounting for potentially 
confounding effects of productivity on N:P ratioand community-mean 
trait values (see Extended Data Fig. 1), we found that plots with a high 
N:P ratio were significantly (P < 0.001) associated with the occurrence 
of species with a small investment in sexual reproduction (such as low 
seed number (Fig. 2f) and seed investment (Fig. 2g), late start of flower- 
ing (Fig. 2h), short flowering period (Fig. 2i), vegetative reproduction 
rather than reproduction by seeds (Fig. 2k, 1), perennials rather than 
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annuals or biennials (Fig. 2m)) and less association with N fixers (Fig. 20). 
Species occurring at high N:P ratio plots also had leaf traits characteristic 
of slow-growing species, such as low specific leaf area (SLA) (Fig. 2c) 
and high leaf dry-matter content (LDMC) (Fig. 2d). Some traits (such 
as canopy height, leaf mass and plant architecture (monocot or eudicot)) 
correlated strongly (P< 0.001) with productivity but not (P > 0.05) 
with N:P ratio, as shown by a dominant contribution of productivity to 
their explained variance (Fig. 2a, b, n). The relationships between N:P 
ratio and particular plant traits also became apparent when species were 
classified into ‘strategies’ according toa previous paper"; that is, the N:P 
ratios correlated positively with abundance of ‘stress tolerators’ (P < 0.001), 
correlated negatively with abundance of ‘ruderals’ (P < 0.001), and were 
not significantly correlated with abundance of ‘competitors’ (P > 0.05) 
(Fig. 3). Furthermore, a principal component analysis (PCA) indicated 
that the trade-off between seed and vegetative reproduction was strongly 
correlated (P < 0.001) with N:P ratio, with the unique effect of N:P ratio 
being stronger (21% of variation explained) than that of productivity 
(9% of variation explained), a finding that confirmed the robustness of 
our analysis (See Supplementary Discussion 2). 

We found three contrasting but not mutually exclusive relationships 
between nutrient availability or stoichiometry and life-history plant 
traits. First, overall nutrient availability—but not N:P stoichiometry— 
is related to traits involved in competition for light (for example, plant 
size, ‘competitor’ strategy). Second, both N:P stoichiometry and overall 
nutrient availability are related to leaf economy traits (for example, specific 
leaf area, leaf dry-matter content). Previous studies have shown that 
leaf economy traits are related to overall nutrient availability'””, but 
our results show that fast-growing species also have an affinity for low 
N:P ratio environments, independently of the overall nutrient availa- 
bility effect. Third, N:P stoichiometry is related to investment in sexual 
reproduction almost independently of overall nutrient availability; that 
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Figure 3 | Relationship between N:P ratios and community-mean values for 
G, Sand R scores of CSR strategy after removing confounding effects of 
productivity. a, C (competitor) scores. b, S (stress tolerator) scores. c, R 
(ruderal) scores. Confounding effects of productivity were removed using the 
residual values of N:P ratio (x axis) and community-mean CSR scores (y axis) 
regressed by productivity. Number of sites (n) = 528; see Fig. 2 legend for 
figure specifications. 
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is, high N:P conditions (P limitation) correlated with low investment in 
sexual reproduction. Such low investment restricts P losses, since repro- 
ductive organs are P-rich”°. Classical allocation studies have shown that 
plants can invest up to 50 to 60% of all acquired P in sexual reproduc- 
tion, and that this percentage is generally higher for P than for N and 
other elements”’”*. Impaired investment in sexual reproduction under 
P limitation is also supported by experimental data”. An alternative 
strategy for a plant is to produce few seeds in order to maintain a high 
P concentration per seed, which is an important factor for successful 
recruitment in P-impoverished soils”. 

Our study clearly shows that endangered species have different suites 
of functional traits than non-endangered species (permutational multi- 
variate analysis of variance (PERMANOVA), Fj281 = 2.67, P< 0.05). 
Compared with non-endangered species, they have a lower canopy height, 
less investment in sexual reproduction (fewer seeds and smaller seed mass 
per individual), a shorter flowering period and a later starting time of 
flowering, and are perennials rather than annuals (all differences signi- 
ficant at least at the P < 0.05 level) (Fig. 4). The lower reliance of endan- 
gered species on seed reproduction is also seen in the PCA axis scores 
(Supplementary Discussion 2), and confirms the findings of previous 
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Figure 4 | Difference in trait values between endangered and non- 
endangered herbaceous vascular plant species. The difference was expressed 
with Cohen’s d for continuous traits, and with log-odds ratios for binary traits. 
The number of endangered versus non-endangered species examined are 
(from top to bottom); 150/267, 123/237, 128/246, 121/239, 126/243, 113/226, 
101/223, 153/265, 153/265, 149/259, 149/259, 149/259, 149/253, 149/253, 
155/268, 152/281 and 157/289. Positive values of Cohen’s d or log-odds ratios 
indicate that endangered species have higher trait values. 95% confidence 
intervals are shown as whiskers. Lateral spread, for which effect size could not 
be calculated, was not significantly different between endangered and 
non-endangered species (two-tailed Mann-Whitney U-test, P > 0.05). 


©2014 Macmillan Publishers Limited. All rights reserved 


studies comparing common with endangered or rare species (for example, 
a shorter flowering period’’, smaller seed mass”° and poorer dispersal 
ability’”*). 

Endangered species occur more frequently under P-limited condi- 
tions (high N:P ratio environments) than can be explained by chance, 
as shown for temperate regions in our data set and in a previously studied 
much more limited data set? as well as for a tropical region’. Our trait 
analysis provides two possible explanations for the frequent occur- 
rence of endangered species in P-limited conditions. The first is that 
endangered species are often small and are therefore poor competitors 
for light. Small size is a major disadvantage when growing in produc- 
tive sites, but on poorly productive sites, which are associated with a 
high plant N:P ratio (see Supplementary Discussion 1), they face little 
competition. Second, the relatively low investment of endangered species 
in sexual reproduction is characteristic of plant species under P limi- 
tation (high N:P ratio environments). Thus, both increased productivity 
of ecosystems and changed N:P stoichiometry potentially threaten the 
survival of such species; moreover, their low dispersal capacity makes 
them vulnerable to such threats. The idea that endangered species are 
vulnerable for changes in the relative availabilities of N and P is sup- 
ported by a global study showing that species with a narrow geogra- 
phical range (that is, those more likely to become endangered) have 
higher leaf N:P ratios than those with a wide range”. The exact mecha- 
nisms and potentially interacting processes that may explain why species 
vulnerable for extinction occur on P-limited sites need to be tested further. 
However, it is likely that large-scale P enrichment of herbaceous eco- 
systems that boosts productivity and ends P limitation causes species 
adapted to P limitation to be more vulnerable to extinction. Moreover, 
the low investment in sexual reproduction of these species, which is a 
beneficial trait in P-poor environments, is a drawback for their dis- 
persal ability. N fertilization will probably not promote survival of 
endangered species, as there are a number of mechanisms for increas- 
ing P uptake from diverse forms of P in soil (for example, root exudates, 
mycorrhizae) under N-rich conditions”, and therefore P limitation might 
not be enhanced by N enrichment. Instead, to better protect endan- 
gered species, we should aim to preserve P-limited and poorly pro- 
ductive sites. Given that these sites are already scarce and scattered, 
that landscapes are increasingly human-influenced and urbanized, 
and that endangered species have less sexual reproduction (and so are 
disadvantaged in long-distance dispersal), it is clear that these species’ 
vulnerability for extinction is acute. 


METHODS SUMMARY 


Species composition of vascular plants, their aboveground biomass (as a proxy for 
site productivity), and N:P:K ratio in the biomass (as a proxy for relative availa- 
bility of N, P and K for plants) were recorded in 599 plots in herbaceous ecosystems. 
The selected ecosystems range from wet to moist conditions and include grass- 
lands, fens, bogs, marshes, reedland, floodplains and dune slacks in nine countries 
in Eurasia. As N:P stoichiometry is our focus, K-limited plots (N:K ratio >2.1 and 
K:P ratio <3.41; n = 60) were excluded from further analysis. Of the total 491 
vascular plant species recorded, 172 endangered species were identified from the 
combined Red Lists of seven of the countries. We examined the effects of N:P ratio 
on biodiversity indices (number of species, number and percentage of endangered 
species) and community-mean values of 15 functional traits and Grime’s CSR 
(competitor, stress tolerator, ruderal) strategy of 446 herbaceous species retrieved 
from trait databases (see Extended Data Table 1). In all analyses, the confounding 
effects of productivity on the variables of interest were statistically removed (Extended 
Data Fig. 1). For the effects of N:P ratio on biodiversity indices, quantile regression 
analysis was carried out between N:P ratios (corrected for productivity) and bio- 
diversity indices for the 0.50 to 0.95 quantiles. The impact of N:P ratio on community- 
mean trait values (both corrected for productivity) was tested using path analysis, 
and the relative contributions of productivity and N:P ratio to community-mean 
trait values were quantified by partitioning the explained variance of traits to unique 
and shared effects of productivity and N:P ratio. Furthermore, differences between 
endangered and non-endangered species in terms of their functional trait composi- 
tion were examined with a PERMANOVA. In addition, for each of 15 traits and CSR 
strategy, the differences between endangered and non-endangered species were 
examined by Cohen’s d for continuous traits and log-odds ratio for binary traits. 
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METHODS 


Plot selection. We collected vegetation data on 599 plots from herbaceous eco- 
systems, including the 276 sites studied in ref. 2. The sample consists of grasslands, 
fens, bogs, marshes, reed beds and dune-slack vegetation. All plots are non-brackish, 
with moist to wet conditions, thus impacts of drought and salinity on species rich- 
ness were avoided. Only 19 plots are lightly fertilized (<100kg Nha ' yr‘), and 
102 plots are exposed to periodic river flooding. All plots are dominated by herba- 
ceous species (>50% cover). The plots were selected to span a wide geographical 
range of Eurasian countries: The Netherlands (255 plots), Poland (153), Russia 
(82), Germany (43), Belgium (20), Iceland (17), Sweden (10), Scotland (10) and 
Belarus (9). These plots encompass most of the west and central European low- 
lands. The flora in the country-wise partial data sets we collected is similar: on 
average, 86% (and a minimum of 70%) of the species recorded in any country had 
also been recorded in the other countries. For each plot, aboveground standing 
biomass of vascular plants was harvested at the peak of the growing season; that is, 
from June to August. The harvested area ranged from 0.06 m* to 1 m’. The content 
of N, P and K in the biomass was analysed after Kjeldahl digestion*’. Composition 
of vascular plant species was recorded in or around the harvested area in plots of 
0.06 to 25m”. The different plot sizes did not affect the relationships between 
biodiversity and N:P stoichiometry (see Supplementary Discussion 3). 
Functional characteristics of species. Per species, we quantified those functional 
traits available in trait databases**~** for most of the recorded plant species that are 
known to be important for the growth or reproduction strategies of herbaceous 
plant species (See Extended Data Table 1 for an overview of traits, units and 
sources). We excluded woody species (45 species out of the total 491 species) from 
the trait analysis, as most woody species recorded in our plots were seedlings and 
therefore the trait values available in databases (which are for adult individuals) are 
not relevant. The selected traits were those related to competition for light (canopy 
height, leaf mass), leaf economy traits (specific leaf area (SLA), leaf dry-matter 
content (LDMC)), seed traits (seed mass, number of seeds per individual, seed 
investment (that is, seed mass per individual, calculated as seed mass times number 
of seeds per individual)), phenology traits (starting month of flowering, duration of 
flowering period), reproduction strategy traits (lateral spread, type of reproduc- 
tion, plant lifespan), a plant architecture trait (that is, eudicots or monocots), anda 
nutrient acquisition trait (N fixation) (Extended Data Table 1). We note that seed 
investment is a trait that may be biased by the size of the plant. However, even when 
we corrected seed investment roughly for plant size, the relationship between N:P 
ratio and this trait did not change (Supplementary Discussion 4). Duration of 
flowering period (expressed in months) and starting month of flowering (ranging 
from January to August, and thus coded 1 to 8) were treated as continuous vari- 
ables. All continuous traits except LDMC and starting month of flowering were 
log-transformed to adjust the right-skewed frequency distributions. Type of repro- 
duction, expressed on an ordinal scale with five classes, was converted into two 
binary variables (reproduction by seeds and vegetative reproduction) to be used in 
subsequent analyses. Additionally, the CSR strategy’* was attributed to each species 
by using seven traits (canopy height, LDMC, leaf mass, SLA, flowering period, 
starting month of flowering, and lateral spread) according to a method described 
previously*’. When this method could not be applied because trait information 
was incomplete, the CSR classification in BioFlor*® was used. Ultimately, CSR 
classes were defined for 408 out of 446 herbaceous species. For each species, scores 
for each primary component (C, competitor; S, stress tolerator; R, ruderal) were 
assigned from its proportional contributions (for example, C scores are 1 for the 
strategy ‘C’, 0.5 for ‘CS’, 0.33 for ‘CSR’ and 0.75 for ‘C/CR™). In addition, we used 
a principal component analysis (PCA) to identify the major axes of variation in 
multiple functional traits (see Supplementary Discussion 2). 

For each plot, community-mean values (unweighted for the abundance of species) 
of continuous traits were calculated as an indicator of the mean response of the 
plant community to the site conditions. We treated the community-mean values 
of lateral spread, an ordinal trait, in the same way as those of continuous traits, 
because they were approximately normally distributed. For binary traits, the number 
of species with 1s and 0s were counted per plot. Plots with fewer than three species 
with a valid trait value, and plots in which less than 50% of occurring herbaceous 
species had a valid trait value, were omitted from the analysis. The omitted plots 
ranged from 6 to 16 plots (average 10.7 plots) of the total (539 plots). 
Endangered species. We compiled a list of endangered species by combining the 
regional Red Lists of the Netherlands”, Germany”, Poland“, Sweden**, UK*, 
Iceland” and the Novosibirsk region in Siberia**. The Red List of Belgium was 
not included, because the Belgian plots were near the border with the Netherlands 
and their flora was comparable with the flora in the Dutch plots. We also did not 
use the list of Belarus, because the number of plots in this country was small 
(n = 9) and the flora in these plots overlapped with those of Poland. The Red 
List status of a species reflects both the decline of the habitat in the region and the 
susceptibility of the species to the changing environment. Some species are Red 
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Listed in one country but not in others, because in those countries their habitat is 
not deteriorating. For our analysis, only the susceptibility of an endangered species 
is relevant, not the region-specific habitat deterioration, because we are interested 
in the mechanisms whereby species become endangered (that is, their functional 
traits). Therefore, we identified a species as ‘endangered’ if it is on at least one 
regional Red List (meaning that this species has fragile characteristics that are 
susceptible to environmental changes), and applied this new list to all plots. In this 
way we corrected for habitat loss, which is largely region-specific. Note that we included 
the categories which refer to actual decline of the species (‘critically endangered’, 
‘endangered’ and ‘vulnerable’ species), but not the category which refers to the scar- 
city of the species (‘rare’). In this way, we excluded species which are always rare 
irrespective of environmental change. Of our 491 vascular plant species, we iden- 
tified 172 endangered species (157 herbaceous and 15 non-herbaceous species). For 
each plot, we counted number and percentage of endangered species. The latter is 
the number of endangered species divided by the total number of species (X 100) 
per plot. 

Data analysis approach. We are interested in the effects of nutrient stoichiometry 
(that is, the ratio between N and P availability for plants) on species diversity and 
on prevailing functional traits of species in plant communities, irrespective of 
confounding effects of overall nutrient availability. We use N:P ratios of above- 
ground plant biomass as a proxy for nutrient stoichiometry (see Supplementary 
Discussion 1 for a justification of using plant N:P ratio), and site productivity (that 
is, aboveground biomass of vascular plants) as a proxy for overall nutrient avail- 
ability. As the consideration of N:P ratio is relevant only when a plot is limited or 
co-limited by N or P, we excluded K-limited plots (m = 60) from the analysis. We 
considered a plot as K-limited if the N:K ratio was more than 2.1 and the K:P ratio 
was less than 3.4 (ref. 1). 

We propose the following relationships between plant N:P ratio, site produc- 

tivity, species diversity and community-mean species traits (see also Extended Data 
Fig. 1). Plant N:P ratio and site productivity are related through a scaling law proposed 
by the growth rate hypothesis”? (arrow a in Extended Data Fig. 1): plants grown 
in fertile environments (these tend to be fast-growing species) exhibit low biomass 
N:P ratios because of the high amount of P-rich RNA needed for rapid division of 
cells**' (see Supplementary Discussion 1 for more details). We are aware of the 
opposite direction of effect too (that is, N:P ratio influencing site productivity), 
particularly at extreme values of N:P ratios, where a deficiency of N or P limits site 
productivity, but we consider this effect to be minor. Furthermore, we posit that 
both species diversity and community-mean trait values are influenced by site pro- 
ductivity and N:P ratio. However, there is an intrinsic difference between how 
these drivers affect species richness and how they affect community-mean traits. 
Site productivity and N:P ratio ‘filter’ the community-mean traits (arrows e and d, 
respectively, in Extended Data Fig. 1b), acting on the mean of the community- 
mean trait values. In contrast, site productivity and N:P ratio ‘limit’ species diver- 
sity (arrows c and b, respectively, in Extended Data Fig. 1a), acting on the upper 
values of species diversity. Given the different nature of the relationships, we 
employed two sets of methods to eliminate potential confounding factors from 
our analysis. All analyses were performed in R®. 
Effects of N:P ratio on species diversity. A relationship between a dependent 
variable (species diversity) and one or more limiting factors (site productivity and 
N:P ratio) can be tested with quantile regression analysis. Unlike conventional 
regression, which considers solely changes in the mean of the response variable, 
quantile regression excludes the effect of unmeasured limiting factors”. It is there- 
fore a powerful method to analyse the change in the potential species diversity as a 
function of the limiting factor; that is, N:P ratio, only. 

We used three indices to assess species diversity: number of species (y;), number 
of endangered species (y2), and percentage of endangered species over total num- 
ber of species (y3, which equals to 100-y2/y,). For the number of species and 
number of endangered species, we assumed a quadratic effect of N:P ratio, since 
previous studies suggested that on an N:P ratio gradient there is an optimum 
biodiversity rather than a continuously increasing biodiversity*. Both response 
variables were log-transformed (that is, In(y;) and In(y2+1)) to correct the right- 
skewed frequency distributions. For the percentage of endangered species, a logistic 
quantile regression analysis was applied to restrict the prediction to between 0 
and 100. Here we used ‘empirical logits’**, In((y2 + 0.5)/(y, — y2 + 0.5)), instead of 
normal logits, In(y2/(y — y2)), to enable computation of logits when a plot has 
0% or 100% of endangered species. Effects of N:P ratio on species diversity indices 
(arrow b in Extended Data Fig. 1a), irrespective of the confounding effects of site 
productivity on N:P ratio (arrow a in Extended Data Fig. 1a), were examined by 
using ‘the residual values of N:P ratio versus productivity’ (obtained from a linear 
regression model on a log-log scale) as an explanatory variable. 

For each diversity index, the tth quantile regression function B,,) was deter- 
mined, indicating the approximately tth proportion of the observations found to 
be below Bi. (ref. 55). As we are interested in the upper boundary of the relationship 
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when N:P ratio actively limits biodiversity, we examined high values of t only 
(t > 0.50). To evaluate the precision of the obtained model, 95% confidence inter- 
vals of the coefficients were computed with the rank inversion method” (Extended 
Data Fig. 2). These analyses were performed with the R package ‘quantreg’*’. 

The analyses were run for the complete data set. A control run was carried out 
on the data set excluding two dominant habitat types (fens and bogs). This tested 
whether our analysis results could have been biased by habitats that may be 
severely deteriorated (for example, fens and bogs in western Europe with desicca- 
tion problems) or that have intrinsically high or low N:P ratios (for example, bogs 
tend to have high N:P ratios because this rainwater-fed ecosystem receives relatively 
high N supply through atmospheric deposition but poor P supply) (Extended Data 
Fig. 3). We did not find evidence for such bias. 

We also tested whether other factors potentially related to P limitation (for 

example, soil acidity, soil moisture) confound the association between N:P ratio 
and richness of endangered species (Supplementary Discussion 5), but this was not 
the case. 
Effects of N:P ratio on community-mean traits. Our aim was twofold: to examine, 
first, the significance of N:P ratios in affecting community-mean trait values; and 
second, the explained variation in community-mean trait values by N:P ratios in 
addition to and in interaction with productivity. 

First, we tested the association between N:P ratio and community-mean traits in 
the proposed relationships between N:P ratio, productivity and community-mean 
traits (Extended Data Fig. 1b) using concepts of path modelling’®. Note that we 
proposed the causal model as depicted in Extended Data Fig. 1b to be true, even 
though causality as such cannot be tested in this model because the hypothesized 
model is ‘saturated’ (that is, all the possible interconnections are specified®’). 
However, it is still possible to refute the existence of a relationship between N:P 
ratio and community-mean trait values (arrow d in Extended Data Fig. 1b) by 
testing the significance of this association while statistically holding productivity 
constant”*. More concretely, the association between N:P ratio and community- 
mean values of a trait was examined from the relationship between the residuals of 
community-mean trait values regressed by productivity (res,,,,) and the residuals 
of N:P ratio regressed by productivity (resyp,)). When the regression coefficient of 
TeS,;,p regressed by resyp,, was not significantly different from zero, we concluded 
that there was no significant association between N:P ratio and the community- 
mean trait. res, were derived from logistic models with negative binomial dis- 
tribution for the binary traits, and from linear models for continuous traits, using 
the generalized linear model (GLM) (Extended Data Fig. 4). To test the relation- 
ship between res;,,) and resnp,» we assumed linear regression models (after square- 
root-transformation of res, for lifespan and log-transformation of res,» for N 
fixation). The residuals of the linear regression models did not deviate from a normal 
distribution (P > 0.05 with Kolmogorov-Smirnov test except for SLA (P = 0.004), 
lateral spread (P = 0.046) and N fixation (P = 0.01); for these three traits we also 
checked the res,,,,—resnp,p relationship with Spearman’s correlation analysis: our 
conclusions were identical). To visualize the effects of N:P ratio on community- 
mean traits (Fig. 2 and Fig. 3), we have shown res;,, against resyp, and the 
significance and strength of this relationship (f, the standardized regression coe- 
fficient). In a path analysis context, f corresponds to a relationship between N:P 
ratio and community-mean traits for which direction of the causality is not pre- 
scribed. The significance levels of fs and the path coefficients (which prescribe the 
direction of causality) were nearly identical (results not shown), showing that our 
conclusion on the existence of association between community-mean traits and 
N:P ratio is robust against the assumed direction of the causality. 

Second, to examine the explained variation in community-mean trait values by 
N:P ratios in addition to and in interaction with productivity, we partitioned the 
variation in community-mean trait values into values explained by unique effects 
of N:P ratio and productivity and their shared effects. We did so by sequentially 
calculating the measures of goodness-of-fit (G) of regression models”. In this way, 
the relative contribution of N:P ratio and productivity on community-mean trait 
values can be quantified irrespective of the underlying multicollinearity among 
three variables”. As a measure of G, we used log likelihood for logistic models 
and R’ for linear models. Unique effects of N:P ratio were calculated as an increase 
in G in the model of community-mean traits regressed by productivity and N:P 
ratio (G(X,X)) compared to that regressed by productivity only (G(X)). Unique 
effects of productivity were calculated identically, as an increase in G(X,X2) com- 
pared to G(X,). Shared effects of productivity and N:P ratio were calculated as: 
G(X1) + G(X2) — G(X xX). 

For some traits, shared effects were negative, indicating that the trait is sup- 
pressed by interaction between productivity and N:P ratio. 

Traits of endangered species versus non-endangered species. The difference in a 
suite of functional traits (11 traits including all the continuous- and ordinal-scale traits) 
between endangered and non-endangered species was tested with PERMANOVA® 
using the R package ‘vegan’®’ for 283 herbaceous species (for which full data were 


available for the 11 traits). The distance matrices were calculated based on Gower’s 
distance. The relative difference in each functional trait between endangered and 
non-endangered species was then quantified by calculating the effect size. Positive 
values for the effect size mean that endangered species have higher trait values than 
non-endangered species. For continuous traits, we used Cohen’s d as effect size 
measure. Canopy height, leaf mass, SLA, seed mass, number of seeds, seed invest- 
ment, and flowering period were log-transformed before the analysis. The equa- 
tions of Cohen’s d and its 95% confidence intervals” are shown in Supplementary 
equation (1). For traits that were only marginally normally distributed (flowering 
start, flowering period, C, S, R scores; P< 0.05 with Kolmogorov-Smirnov test), 
the difference between endangered and non-endangered species was also tested 
with the Mann-Whitney U-test; the conclusions were identical. For binary traits, 
the effect size was examined by the log-odds ratio. The equations of log-odds ratio 
and its 95% confidence intervals™ are shown in Supplementary equation (2). Note 
that the scales of Cohen’s d and log-odds ratio are different, so the effect size of 
continuous and binary traits cannot be compared. For ordinary traits, effect size 
cannot be calculated. For these traits, the Mann-Whitney U-test was used to test 
the difference between endangered and non-endangered species. 
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Extended Data Figure 1 | Data analysis approach. a, b, Schematic proposed 
relationships between site productivity (that is, aboveground biomass of 
vascular plants; X,), N:P ratio in aboveground plant biomass (X,), and 
species diversity (a; X3) or community-mean traits (b; X4). Solid arrows are 
relationships in which the explanatory variable is constrained by the response 
variable (direct causality); dashed arrows are relationships in which upper 
bound of the explanatory variable is constrained by the response variable 
(limitation). Arrow a represents the pattern predicted by the growth rate 
hypothesis (see Supplementary Discussion 1 for details). The effect of N:P 
ratio on species diversity (arrow b) was tested by quantile regression analysis 
(thus treating arrow c as another limiting factor) with the residual values of X, 
versus X, as an explanatory variable (thus removing the effect illustrated by 
arrow a). The effect of N:P ratio on a community-mean trait (arrow d) was 
tested by comparing the residual values of X, versus X2 (thus removing 

the effect illustrated by arrow a) with the residual values of X4 versus Xz 
(thus removing the effect illustrated by arrow e), using concepts of 

path analysis. 
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Extended Data Figure 2 | Ninety-five per cent confidence intervals of the 
quantile regression coefficients. a—c, Estimates (dots) and 95% confidence 
intervals (bars) of quadratic and linear coefficients (bz and by, respectively) 
of quantile regression models are shown for the number of vascular plant 
species (a), the number of endangered species (b), and the percentage of 
endangered species (c) regressed by N:P ratio corrected for productivity effects. 
The fitted models were (y;): In(y1) = bo + byx + b,x” for number of species; 


(ya): In(y2 + 1) = bo + bx + byx* for number of endangered species; and 

(v3 = 100*y2/y1): In((y2 + 0.5)/(y1 — y2 + 0.5)) = bo + byx for percentage of 
endangered species, where x is the residuals of plant N:P ratio regressed by 
productivity. Models were examined for 50% (t = 0.50) to 95% (t = 0.95) 
quantiles. See Fig. 1 for the shape of the quantile regression models for t = 0.50, 
0.75, 0.90, 0.95. 
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Residual N:P ratio 


Extended Data Figure 3 | Effects of habitat types on relationships between 
residual N:P ratio and biodiversity indices. Relationships between N:P ratio 
corrected for productivity effects and the number of endangered species (a) 
and percentage of endangered species (b) are shown for different habitat types 
(left, 187 fens; middle, 56 bogs; and right, 296 other habitat types). Linear, 
rather than quadratic, quantile regression models were applied because for 
most quantiles the quadratic coefficients did not differ significantly from zero. 
tth linear quantile regression models (t = 0.50, 0.75, 0.90, 0.95) are shown only 


Residual N:P ratio 


Residual N:P ratio 


when the 95% confidence intervals of the linear coefficients of the regression 
models were above or below zero for the majority of the quantiles. Number and 
percentage of endangered species increased concomitantly with increasing 
N:P ratio (corrected for productivity) even in plots that are not fens and 
bogs, indicating that our findings on the relationship between N:P ratio and 
endangered species were not an artefact resulting from the stratified sampling 
of habitat types. 
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Extended Data Figure 4 | Relationships between community-mean trait 
values and plant N:P ratio. a—r, The tested traits are canopy height (a, number 
of sites (n) = 530), leaf mass (b, n = 525), specific leaf area (c, n = 529), 

leaf dry-matter content (d, n = 525), seed mass (e, n = 533), seed number per 
shoot (f, n = 524), seed investment (g, n = 523), starting month of flowering 
(h, n = 528), flowering period (i, n = 528), lateral spread (j, n = 526), 
reproduction by seeds (k, n = 528), vegetative reproduction (1, n = 528), life 
span (m, n = 531), plant architecture (n, n = 533), N fixation (0, n = 502), 
Cscore (p, m = 528), S score (q, m = 528) and R score (r, n = 528). See Extended 


15 20 25 30 35 40 
In N:P ratio 

Data Table 1 for abbreviations and units of the traits. Canopy height, leaf mass, 
specific leaf area, seed mass, number of seeds, seed investment, and flowering 
period were log-transformed before the calculation of community-mean 
values. For binary traits, plot mean values were shown as a fraction of species 
with 1s over total species (that is, sum of 1s and 0s) to allow graphical 
presentation. Standardized regression coefficients (3) of community-mean 
trait regressed by N:P ratio using GLM and their two-tailed p-values 
(***P < 0.001, **P < 0.01, *P < 0.05) are shown. 
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Extended Data Table 1 | List of functional traits of herbaceous vascular plant species. 14 species functional traits were retrieved from trait 
databases, and 2 binary traits (‘Reproduction by seeds’ and ‘Vegetative reproduction’) were derived from an ordinal trait (‘Type of reproduction’). 
For regression analyses of community-mean trait values (Fig. 2) and for effect-size calculation (Fig. 4), we did not use ‘Type of reproduction’ but 
the two derived binary traits instead. The traits were retrieved for 446 herbaceous vascular plant species. Canopy height, leaf mass, SLA, seed 


mass, number of seeds, seed investment, and flowering period were log-transformed before all analyses. 


Trait 

Canopy height 

Leaf mass 

Specific leaf area (SLA) 
Leaf dry matter content (L.DMC) 
Seed mass 

Number of seeds 

Seed investment 

Starting month of flowering 
Duration of flowering period 
Lateral spread 

Type of reproduction § 
Reproduction by seeds 
Vegetative reproduction 
Life span 

Plant architecture 


N-fixation 


Scale and Unit 


Continuous; m 

Continuous; mg 

Continuous; mm2/mg 

Continuous; % 

Continuous; mg per seed 

Continuous; number per shoot 

Continuous: mg per shoot 

Continuous*t; month 

Continuous*; month 

Ordinal}; 0: annuals, 1: <0.01m, 2: 0.01-0.25m, 3: =0.25m 
Ordinal; 1: s, 0.75: ssv, 0.5: sv, 0.25: sw, O: v 
Binary; 1: yes (s/ssv/sv), O:no or seldom (svv/v) 
Binary; 1: yes (v/vvs/sv), O:no or seldom (ssv/s) 
Binary; 1: annual or biennial, 0: Perennial 
Binary; 1: monocot, 0: eudicot 


Binary; 1: nodulated legume, 0: others 


* Month is strictly speaking an ordinal scale, but treated here as a continuous scale. 
+ Range from January to August, thus coded as 1 to 8. 

tCommunity-mean values of lateral spread were treated as continuous variables, as lateral spread does not deviate from normal distribution. 
§ Expressed as the relative dependency on seed reproduction. The original categories in the database are: s (by seeds), ssv (mostly by seeds), sv (both by seeds and vegetatively), vvs (mostly vegetatively), 


v (vegetatively). 


% of species with 
a trait value 


93 
81 
84 
81 
83 
76 
73 
94 
94 
85 
90 
90 
90 
95 
97 


100 
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Source 

32, 33 

33 

33 

33 

33, 36, 38 

33 

Seed mass x Number of seeds 

32, 36 

32, 36 

34, 35 

36 

Derived from ‘Type of reproduction’ 
Derived from ‘Type of reproduction’ 
32, 33, 36 

32 


37 
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Upper Palaeolithic Siberian genome reveals dual 
ancestry of Native Americans 


Maanasa Raghavan'*, Pontus Skoglund**, Kelly E. Graf’, Mait Metspalu*”°, Anders Albrechtsen’, Ida Moltke”*®, 
Simon Rasmussen’, Thomas W. Stafford Jr’!°, Ludovic Orlando!, Ene Metspalu®, Monika Karmin*”, Kristiina Tambets*, 


14,15 


Siiri Rootsi*, Reedik Magi’, Paula F. Campos!, Elena Balanovska’’, Oleg Balanovsky’*’, Elza Khusnutdinova!**, 
Sergey Litvinov**, Ludmila P. Osipova!®, Sardana A. Fedorova!’, Mikhail I. Voevoda'®'®, Michael DeGiorgio”, 


Thomas Sicheritz-Ponten®!’, Soren Brunak?!?, Svetlana Demeshchenko”’, Toomas Kivisild*", Richard Villems 


Rasmus Nielsen”, Mattias Jakobsson”* & Eske Willerslev' 


The origins of the First Americans remain contentious. Although 
Native Americans seem to be genetically most closely related to east 
Asians‘”’, there is no consensus with regard to which specific Old 
World populations they are closest to**. Here we sequence the draft 
genome of an approximately 24,000-year-old individual (MA-1), 
from Mal’ta in south-central Siberia’, to an average depth of 1X. 
To our knowledge this is the oldest anatomically modern human 
genome reported to date. The MA-1 mitochondrial genome belongs 
to haplogroup U, which has also been found at high frequency among 
Upper Palaeolithic and Mesolithic European hunter-gatherers’*”, 
and the Y chromosome of MA-1 is basal to modern-day western Eura- 
sians and near the root of most Native American lineages’. Similarly, 
we find autosomal evidence that MA-1 is basal to modern-day west- 
ern Eurasians and genetically closely related to modern-day Native 
Americans, with no close affinity to east Asians. This suggests that 
populations related to contemporary western Eurasians had a more 
north-easterly distribution 24,000 years ago than commonly thought. 
Furthermore, we estimate that 14 to 38% of Native American ancestry 
may originate through gene flow from this ancient population. This 
is likely to have occurred after the divergence of Native American 
ancestors from east Asian ancestors, but before the diversification 
of Native American populations in the New World. Gene flow from 
the MA-1 lineage into Native American ancestors could explain why 
several crania from the First Americans have been reported as bear- 
ing morphological characteristics that do not resemble those of east 
Asians”’’, Sequencing of another south-central Siberian, Afontova 
Gora-2 dating to approximately 17,000 years ago“, revealed similar 
autosomal genetic signatures as MA-1, suggesting that the region 
was continuously occupied by humans throughout the Last Glacial 
Maximum. Our findings reveal that western Eurasian genetic sig- 
natures in modern-day Native Americans derive not only from post- 
Columbian admixture, as commonly thought, but also from a mixed 
ancestry of the First Americans. 

In 2009 we visited Hermitage State Museum in St. Petersburg, Russia, 
and sampled skeletal remains of a juvenile individual (MA-1) from the 


4,6,22 
’ 


Mal’ta Upper Palaeolithic site in south-central Siberia. Mal’ta, located 
along the Belaya River near Lake Baikal, was excavated between 1928 
and 1958 (ref. 9) and yielded a plethora of archaeological finds includ- 
ing 30 anthropomorphic Venus figurines, which are rare for Siberia but 
found at a number of Upper Palaeolithic sites across western Eurasia’*” 
(Fig. la and Supplementary Information, section 1). Accelerator mass 
spectrometry (AMS) '*C dating of MA-1 produced an age of 20,240 + 60 
© years before present or 24,423-23,891 calendar years before present 
(cal. Bp) (Supplementary Information, section 2). 

DNA from 0.15 g of bone from MA-1 was sequenced to an average 
depth of 1X (Supplementary Information, section 3). From one library 
(referred to as MA-1_1"extraction in Supplementary Information, 
section 3.1), approximately 17% of the total reads generated mapped 
uniquely to the human genome, in agreement with good DNA preser- 
vation (see Supplementary Information Table 2). Low contamination 
rates were inferred for both mitochondrial DNA (mtDNA) (1.1%) and 
the X chromosome (1.6 to 2%; MA-1 is male) (Supplementary Infor- 
mation, section 5). The overall error rate for the data set was estimated 
to be 0.27%, with the most dominant errors being transitions typical 
of ancient DNA damage deriving from post-mortem deamination of 
cytosine’* (Supplementary Information, section 6.1). 

Phylogenetic analysis of the MA-1 mtDNA genome (76.6 X) places 
it within mtDNA haplogroup U without affiliation to any known sub- 
clades, implying a lineage that is rare or extinct in sampled modern 
populations (Supplementary Information, section 7 and Supplemen- 
tary Fig. 4a). Present-day distribution of haplogroup U encompasses a 
large area including North Africa, the Middle East, south and central 
Asia, western Siberia and Europe (Supplementary Fig. 4b), although it 
is rare or absent east of the Altai Mountains; that is, in populations 
living in the region surrounding Mal’ta. Haplogroup U has also been 
found at high frequency (>80%) in ancient hunter-gatherers from Upper 
Palaeolithic and Mesolithic Europe'”’. Our result therefore suggests a 
connection between pre-agricultural Europe and Upper Palaeolithic 
Siberia. The Y chromosome of MA-1 was sequenced to an average 
depth of 1.5X, with coverage across 5.8 million bases. Acknowledging 
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Figure 1 | Sample locations and MA-1 genetic affinities. a, Geographical 
locations of Mal’ta and Afontova Gora-2 in south-central Siberia. For reference, 
Palaeolithic sites with individuals belonging to mtDNA haplogroup U are 
shown (red and black triangles): 1, Oberkassel; 2, Hohle Fels; 3, Dolni 
Vestonice; 4, Kostenki-14. A Palaeolithic site with an individual belonging to 
mtDNA haplogroup B is represented by the square: 5, Tianyuan Cave. Notable 
Palaeolithic sites with Venus figurines are marked by brown circles: 6, Laussel; 
7, Lespugue; 8, Grimaldi; 9, Willendorf; 10, Gargarino. Other notable 
Palaeolithic sites are shown by grey circles: 11, Sungir; 12, Yana RHS. b, PCA 
(PC1 versus PC2) of MA-1 and worldwide human populations for which 
genomic tracts from recent European admixture in American and Siberian 
populations have been excluded”’. c, Heat map of the statistic f,(Yoruba; 
MA-1, X) where X is one of 147 worldwide non-African populations (standard 
errors shown in Supplementary Fig. 21). The graded heat key represents the 
magnitude of the computed f, statistics. 


the low depth of coverage, we determined the most likely phylogenetic 
affiliation of the MA-1 Y chromosome to a basal lineage of haplogroup 
R (Supplementary Information, section 8 and Supplementary Fig. 5a). 
The extant sub-lineages of haplogroup R show regional spread patterns 
within western Eurasia, south Asia and also extend to the Altai region 
in southern Siberia (Supplementary Fig. 5b). The sister lineage to these 
extant sub-lineages of haplogroup R, haplogroup Q, is the most com- 
mon haplogroup in Native Americans’ and it was recently shown that, 
in Eurasia, haplogroup Q lineages closest to Native Americans are found 
in southern Altai’. 
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To get an overview of the genomic signature of MA-1, we conducted 
principal component analysis (PCA) using a large data set from world- 
wide human populations for which genomic tracts of recent European 
admixture in American and Siberian populations have been excluded” 
(Supplementary Information, section 10). In the first two principal com- 
ponents, MA-1 is intermediate between modern western Eurasians and 
Native Americans, but distant from east Asians (Fig. 1b). To investigate 
the relationship of MA-1 to global human populations in further detail, 
we used the f-statistics framework” to compute an ‘outgroup’ f5-statistic, 
which is expected to be proportional to the amount of shared genetic 
history between MA-1 and each of 147 non-African populations from a 
large worldwide human single-nucleotide polymorphism (SNP) array 
data set (see Supplementary Information, section 14.2 for details on the 
fa-statistics). We find that genetic affinity to MA-1 is greatest in two 
regions: first, the Americas; and second, northeast Europe and north- 
west Siberia, with north-to-south latitudinal clines in shared drift with 
MA-1 in both Europe and Asia (Fig. 1c and Supplementary Figs 21 and 22). 
Notably, the lack of genetic affinity between MA-1 and most popula- 
tions in south-central Siberia today suggests that there was substan- 
tial gene flow into the region after the Last Glacial Maximum (LGM), 
mostly probably from east Asian sources (Supplementary Information, 
section 9.1.3). 

We reconstructed admixture graphs using TreeMix”' to relate the 
population history of MA-1 to 11 modern genomes from worldwide 
populations”, 4 new genomes from Eurasia (Mari, Avar, Indian and 
Tajik ancestry) and the Denisova genome” (Supplementary Informa- 
tion, section 11). The maximum-likelihood population tree inferred 
without admixture events places MA-1 ona branch that is basal to western 
Eurasians (Supplementary Fig. 12). However, a significant residual was 
observed between the empirical covariance for MA-1 and Karitiana, a 
Native American population, and the covariance predicted by the tree 
model (Supplementary Fig. 12). Consequently, gene flow between these 
lineages was inferred in all graphs incorporating two or more migra- 
tion events (Fig. 2 and Supplementary Fig. 13). Bootstrap support for 
the migration edge from MA-1 to Karitiana, rather than from Karitiana 
to MA-1, was 99% in this analysis. 

We investigated further the population history of MA-1 by con- 
ducting sequence read-based D-statistic tests** on proposed tree-like 
histories comprising MA-1 and combinations of 11 modern genomes 
(Supplementary Information, section 13). In agreement with the TreeMix 
results, these tests reject the tree ((X, Han), MA-1) where X represents 
Avar, French, Indian, Mari, Sardinian and Tajik, consistent with the 
MA-1 lineage sharing more recent ancestry with the western Eurasian 
branch after the split of Europeans and east Asians (Supplementary 
Table 13). This result also holds true when the Han Chinese is replaced 
with Dai, another east Asian population (Supplementary Table 13). 
Notably, we can also reject the tree ((Han, Karitiana), MA-1) (Z = 10.8), 
suggesting gene flow between MA-1 and ancestral Native Americans, 
in accordance with the admixture graphs (Supplementary Table 13). 
This result is consistent with allele frequency-based D-statistic tests” 
on SNP arrays for 48 Native American populations of entirely First 
American ancestry’’, indicating that all tested populations are equally 
related to MA-1 and that the admixture event occurred before the 
population diversification of the First American gene pool (Fig. 3a, 
Supplementary Information, section 14.4 and Supplementary Fig. 24). 

The genetic affinity between Native Americans and MA-1 could be 
explained by gene flow after the split between east Asians and Native 
Americans, either from the MA-1 lineage into Native American ancestors 
or from Native American ancestors to the ancestors of MA-1. How- 
ever, MA-1, at approximately 24,000 cal. Bp, pre-dates time estimates of 
the Native American-east Asian population divergence event”*”*. This 
presents little time for the formation of a diverged Native American 
gene pool that could have contributed ancestry to MA-1, suggesting 
gene flow from the MA-1 lineage into Native American ancestors. Such 
gene flow should also be detectable using modern-day western Eurasian 
populations in place of MA-1. Consistent with this, D-statistic tests 
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Figure 2 | Admixture graph for MA-1 and 16 complete genomes. An 
admixture graph with two migration edges (depicted by arrows) was fitted 
using TreeMix”' to relate MA-1 to 11 modern genomes from worldwide 
populations”, 4 modern genomes produced in this study (Avar, Mari, Indian 
and Tajik), and the Denisova genome”. Trees without migration, graphs with 
different number of migration edges, and residual matrices are shown in 
Supplementary Information, section 11. The drift parameter is proportional to 
2N, generations, where N, is the effective population size. The migration weight 
represents the fraction of ancestry derived from the migration edge. The 
scale bar shows ten times the average standard error (s.e.) of the entries in the 
sample covariance matrix. Note that the length of the branch leading to MA-1 is 
affected by this ancient genome being represented by haploid genotypes. 


estimated from outgroup-ascertained SNP data” reveal significant 
evidence (Z > 3) for Middle Eastern, European, central Asian and 
south Asian populations being closer to Karitiana than to Han Chinese”® 
(Fig. 3b and Supplementary Information, section 14.5). Similar signals 
were also observed when we replaced modern-day Han Chinese with 
data from chromosome 21 from a 40,000-year-old east Asian indi- 
vidual (Tianyuan Cave, China), which has been found to be ancestral 
to modern-day Asians and Native Americans”® (Supplementary Infor- 
mation, section 14.5). Thus, if the gene flow direction was from Native 
Americans into western Eurasians it would have had to spread subse- 
quently to European, Middle Eastern, south Asian and central Asian 
populations, including MA-1 before 24,000 years ago. Moreover, as 
Native Americans are closer to Han Chinese than to Papuans (Fig. 3c), 
Native American-related gene flow into the ancestors of MA-1 is expected 
to result in MA-1 also being closer to Han Chinese than to Papuans. 
However, our results suggest that this is not the case (D(Papuan, Han; 
Sardinian, MA-1) = —0.002 + 0.005 (Z = —0.36)), which is compa- 
tible with all or almost all of the gene flow being into Native Americans 
(Supplementary Information, section 14.6). Similar results are obtained 
when MA-1 is replaced with most modern-day western Eurasian popula- 
tions, except populations with recent admixture from east Asia (Russian, 
Adygei and Burusho) and Africa (Middle Eastern populations) (Fig. 3c). 
The most parsimonious explanation for these results is that Native 
Americans have mixed origins, resulting from admixture between peoples 
related to modern-day east Asians and western Eurasians. Admixture 
graphs fitted with MixMapper” model Karitiana as having 14-38% 
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western Eurasian ancestry and 62-86% east Asian ancestry, but we 
caution that these estimates assume unadmixed ancestral populations 
(Supplementary Information, section 12). 

Importantly, in addition to the low contamination rates and rare or 
extinct uniparental lineages, we exclude modern DNA contamination 
as being the source of the observed population affinities of MA-1 for 
three reasons. First, we corrected the sequence read-based D-statistics 
tests for differing amounts of contamination, using a European indi- 
vidual as the contamination source (Supplementary Information, sec- 
tion 13.5). We find similar outcomes for corrected and uncorrected 
tests (Supplementary Fig. 20), even when contamination levels larger 
than that estimated for MA-1 are considered, confirming that our results 
are not affected by contamination from a European source. Second, 
restricting the PCA to sequences with evidence of post-mortem degra- 
dation gives results that are comparable with those using the complete 
data set (Supplementary Information, section 15). Finally, the genome 
sequence of the researcher (Indian ancestry) who carried out DNA 
extraction and library preparation of MA-1 enables us to exclude the 
researcher as a source of contamination (Supplementary Information, 
sections 11 and 13). In addition, we exclude post-Columbian European 
admixture (after 1492 aD) as an explanation for the genetic affinity 
between MA-1 and Native Americans for three reasons. First, for SNP 
array-based analyses, we take recent European admixture into account 
by using a data set masked for inferred admixed genomic regions”. 
Second, allele frequency-based D-statistic tests” show that all 48 tested 
modern-day populations with First American ancestry”? are equally 
related to MA-1 within the resolution of our data (Supplementary Infor- 
mation, section 14.4), which would not be expected if the signal was 
driven by recent European admixture. Third, MA-1 is closer to Native 
Americans than any of the 15 tested European populations (Supplemen- 
tary Information, section 14.8). 

Human dispersals in northeast Asia immediately before and after 
the LGM are most likely to have led to the settlement of Beringia, and 
ultimately the Americas*®. As MA-1 pre-dates the LGM, we investi- 
gated whether the genetic composition of southern Siberia changed 
during the LGM by generating a low-coverage data set (~0.1) of a 
post-LGM individual from Afontova Gora-2 (AG-2) (ref. 14), located 
on the western bank of the Enisei River in south-central Siberia (Fig. 1a). 
We obtained a direct AMS '*C date of 13,810 + 35 '4C years before 
present or 17,075-16,750 cal. Bp for AG-2 (Supplementary Information, 
section 2). Despite substantial present-day DNA contamination in this 
sample (Supplementary Information, section 5), we find that AG-2 shows 
close similarity to the genetic profile of MA-1 on a PCA (Supplemen- 
tary Information, section 15 and Supplementary Fig. 29) and is sig- 
nificantly closer to Karitiana than to Han (D(Yoruba, AG-2; Han, 
Karitiana) = 0.078 + 0.004, Z= 19.9) (Supplementary Information, 
section 15). We observe consistent results when restricting analyses to 
sequences with evidence of post-mortem degradation (Supplementary 
Information, section 15 and Supplementary Fig. 29), implying that 
southern Siberia may have experienced genetic continuity through 
the environmentally harsh LGM. 

Our study has four important implications. First, we find evidence 
that contemporary Native Americans and western Eurasians share 
ancestry through gene flow from a Siberian Upper Palaeolithic popu- 
lation into First Americans. Second, our findings may provide an 
explanation for the presence of mtDNA haplogroup X in Native 
Americans, which is related to western Eurasians but not found in east 
Asian populations”. Third, such an easterly presence in Asia of a popu- 
lation related to contemporary western Eurasians provides a possibility 
that non-east Asian cranial characteristics of the First Americans” 
derived from the Old World via migration through Beringia, rather than 
by a trans-Atlantic voyage from Iberia as proposed by the Solutrean 
hypothesis**. Fourth, the presence of an ancient western Eurasian 
genomic signature in the Baikal area before and after the LGM suggests 
that parts of south-central Siberia were occupied by humans through- 
out the coldest stages of the last ice age. 
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A humerus from Mal’ta (MA-1) and a humerus from Afontova Gora-2 (AG-2) 
were sampled at the Hermitage Museum, St. Petersburg, for ancient DNA and 
radiocarbon dating analyses. Informed consent and institutional approval were 
obtained for the genome sequencing of modern individuals (Avar, Mari, Tajik and 
Indian ancestry) in accordance with human demographic studies, with ethical 
approval from The National Committee on Health Research Ethics, Denmark 
(H-3-2012-FSP21). Ancient and modern DNA extracts were built into Illumina 
libraries and sequenced on the Illumina HiSeq and MiSeq platforms (Supplemen- 
tary Information, section 3). Reads were mapped to the human reference genome 
builds hg18 and 37.1 (Supplementary Information, section 4). Principal component 
analysis was carried out to investigate affinities to modern populations (Supplemen- 
tary Information, sections 10 and 15). Admixture graphs were fitted to the observed 
allele frequencies using TreeMix”' (Supplementary Information, section 11) and 
MixMapper” (Supplementary Information, section 12). Tree-like population models 
were tested using D-statistics based on both sequence read data” (Supplementary 
Information, section 13) and allele frequency data from SNP arrays” (Supplemen- 
tary Information, section 14.3). Shared genetic drift with modern populations was 
estimated using f;-statistics”” (Supplementary Information, section 14). 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 

Samples. A humerus (MA-1) from Mal’ta and a humerus (AG-2) from Afontova 
Gora-2 were sampled at the Hermitage Museum, St. Petersburg, Russia in 2009 for 
ancient DNA analysis and accelerator mass spectrometry (AMS) “C dating. In 
addition, four modern human samples (Avar, Mari, Tajik and Indian) were 
obtained for genome sequencing in accordance with informed consent require- 
ments for human demographic studies. Ethical approval for genome sequencing of 
the above four modern samples was acquired from The National Committee on 
Health Research Ethics, Denmark (H-3-2012-FSP21). 

Radiocarbon dating. AMS ‘“C dating was carried out on the two ancient bone 
samples following standard protocols*'*” (Supplementary Information, section 2). 
Contemporary '“C standards included National Bureau of Standards Oxalic Acid- 
I and ANU sucrose. Respective chemistry and combustion backgrounds were 
determined by using >70,000-year-old collagen isolated from the fossil Eschrichtius 
robustus (grey whale)***? and Sigma Aldrich L-Alanine (catalogue number A7627). 
The graphitized samples and standards were analysed at the University of California- 
Irvine WM Keck Carbon Cycle Accelerator Mass Spectrometry Laboratory 
(UCIAMS). The '4C dates were calibrated using OxCal 4.2 (ref. 34) and the 
INTCALO9 data set*’. 

Genome sequencing and read processing. DNA extractions and library con- 
structions for the ancient samples were performed in a laboratory facility ded- 
icated to the analysis of ancient DNA (Centre for GeoGenetics, Copenhagen). 
Bone powder from MA-1 and AG-2 (149mg and 119 mg, respectively) was 
extracted using a silica spin-column protocol'’***” (Supplementary Information, 
section 3.1.1). Undigested pellets were subject to another round of digestion. Blood 
samples from one individual each of Avar, Mari and Tajik ancestry were extracted 
using standard protocol** (Supplementary Information, section 3.2.2). A saliva 
sample from an individual of Indian ancestry was extracted using a prepIT*L2P 
extraction kit (DNA Genotek) (Supplementary Information, section 3.2.2). Illumina 
libraries were constructed on the ancient and modern extracts (Supplementary 
Information, sections 3.1.2 and 3.2.3). The protocols outlined in the kit manuals 
(GS FLX Titanium Rapid Library Preparation Kit, 454 Life Sciences, Roche, 
Branford, CO and NEBNext DNA Sample Prep Master Mix Set 2, New England 
Biolabs, E6070) as well as in a previous paper” were followed. Equimolar pools of 
the ancient (100 cycles, single-read mode) and modern libraries (100 cycles, 
paired-end mode) were sequenced on the lumina HiSeq 2000 at the Danish 
National High-Throughput DNA Sequencing Centre. The ancient libraries were 
sequenced to near-saturation. 

Read processing was performed on the ancient and modern genomes produced 

in this study as well as previously published genomes (Supplementary Information, 
sections 4.1 and 4.2). The latter genomes included 11 high-coverage modern 
genomes”, one low-coverage Cambodian genome”, and the Denisovan” and 
Tianyuan’® ancient data sets. All sequences were trimmed using AdapterRemoval’ 
and mapped to the human reference genome builds hg18 and 37.1 using the 
Burrows- Wheeler Aligner (BWA)”. The seed length option was disabled for ancient 
reads to optimize the mapping efficiency. Polymerase chain reaction (PCR) 
duplicates were removed using Picard MarkDuplicates (http://picard.sourceforge.net). 
All modern samples (except the Cambodian genome) and the Denisova individual 
were genotyped using samtools mpileup and bctools™, and filtered to achieve a 
high-confidence SNP set (Supplementary Information, section 4.2). Only bi-allelic 
sites were included when producing the final call set and the individual calls were 
merged to a final set using Genome Analysis Toolkit (GATK) CombineVariants- 
2.5-2 (ref. 45). 
Contamination and error rate estimation. Mitochondrial DNA (mtDNA) con- 
tamination rates for MA-1 and AG-2 were estimated by identifying consensus calls 
in the ancient mtDNA data set that are private or near-private to the ancient 
individual (at an allele frequency of less than 1% in a set of 311 modern human 
mtDNA genomes)*° (Supplementary Information, section 5.1). The near-private 
consensus alleles and potential contaminating reads at these positions were 
counted, and a 95% confidence interval was obtained assuming that the allele 
observed in each read is a random outcome of drawing one of two alleles (endo- 
genous and contaminant). Positions with a depth of less than 10X were excluded, 
as were positions where the consensus allele was either C or G in a transition 
polymorphism, as these are sensitive to post-mortem nucleotide misincorpora- 
tions. A phred-scaled base quality of 30 was required. 

As we found both individuals (MA-1 and AG-2) to be males by comparing the 
number of alignments to the X and Y chromosomes” (Supplementary Information, 
section 4.3), it was possible to obtain X chromosome-based contamination 
estimates using previously published methods** (Supplementary Information, sec- 
tion 5.2). These estimates were based on a fixed set of SNPs known to be poly- 
morphic in European HapMap phase II release 27 data*’. This SNP data set was 
pruned such that polymorphic sites were more than 10 bases apart. The same 
HapMap data was used for estimating allele frequencies in Europeans. The MA-1 


and AG-2 data sets were filtered to remove: regions homologous between the X and 
Y chromosomes; reads mapping non-uniquely to multiple regions of the genome 
with more than 98% identity; reads with mapping quality score less than 30 and 
base quality score less than 20; and sites with a read depth of less than 3 (or 2 
depending on library depth) or above 40. 

The error rates for the sequenced ancient and modern libraries were estimated 
using a method similar to a previously published method” that makes use ofa high 
quality genome (Supplementary Information, section 6.1). The estimates were based 
on the rationale that any given human sample should have the same expected 
number of derived alleles compared to some outgroup, in this case the chimpanzee, 
panTro2, from the multiway alignment hg19 multiz46. The numbers of derived 
alleles were counted from the high-quality genome (individual NA06985 from the 
1000 Genomes Project Consortium”) and the error rate estimates were based on 
the assumption that any excess of derived alleles (compared to the high quality 
genome) observed in our sample is due to errors. The overall error rates were 
estimated using a method of moment estimator, while the type specific error rates 
were estimated using a maximum likelihood approach. The model and the estima- 
tion methods are described in detail elsewhere”. All reads with a mapping quality 
score less than 30 and all bases with a base quality score less than 20 were excluded. 
mtDNA and Y-chromosome haplogroup determination. Sequence reads from 
MA-1 were mapped to the revised Cambridge Reference Sequence (rCRS, NC_ 
012920.1) and filtered for PCR duplicates and paralogues, requiring a minimum 
mapping quality of 25 (Supplementary Information, section 4.1). A file of variants 
filtered for a minimum depth of 10, was generated (Supplementary Information, 
section 7). Indels were excluded from the analysis. mtDNA sequences from the 
individual Dolni Vestonice 14 (DV-14; GenBank accession number KC521458), 
basal to the extant mtDNA haplogroup U5 (ref. 12), was included in the analysis 
for comparison. Both the MA-1 and DV-14 mtDNA sequences were analysed for 
the presence of diagnostic mutations of the major sub-haplogroups of extant mtDNA 
haplogroup U lineages, using information from mtDNA tree Build 15 (Sept 30, 
2012)°!. A phylogenetic tree including all major extant branches of mtDNA hap- 
logroup U was built, with the age estimates (kiloyears + s.d.) of the different sub- 
haplogroups” (Supplementary Fig. 4a). To show the present spread of haplogroup 
U and its different sub-haplogroups, the average frequencies, divided into four 
frequency classes, were calculated in regional groups, using a data set consisting of 
approximately 30,000 partial mtDNA genomes (references in Supplementary 
Information, section 7). 

Owing to low depth of coverage of the MA-1 individual, genotyping at each site 
on the Y chromosome was performed by selecting the allele with the highest 
frequency of bases with a base quality of 13 or higher (Supplementary Information, 
section 8). A multi-fasta file was generated from the variable positions on the Y 
chromosomes available from 24 Complete Genomics public genomes**. SNPs were 
filtered for quality (using the threshold VQHIGH as defined by Complete 
Genomics), with tri-allelic positions excluded and only Y-chromosome regions 
determined as phylogenetically informative being used™*. This yielded a final data 
set of 22,492 positions that was merged with MA-1 Y chromosome data. A neigh- 
bour joining tree with default parameters in MEGA phylogenetic software** was 
constructed (Supplementary Fig. 5a). Phylogenetically informative positions and 
their state in MA-1 were then determined to confirm the placement of MA-1 on 
the tree. Non-informative positions, including those with more than four Ns in the 
public data set, were excluded (633 positions). Moreover, the following positions 
were also excluded which were: in reference state in all individuals, including MA-1 
(7,172 positions); N in MA-1 and either N or reference state among the rest of the 
individuals (9,682 positions); ‘N-ref, those with only N or reference state among all 
individuals (586 positions), and ‘N-alt’, positions with alternative alleles, but dif- 
ficult to classify (11 positions); ‘reference-specific’ (79 positions); and ‘recurrent’ 
(28 positions). This resulted in 4,301 positions being retained that were classified 
according to their haplogroup affiliations. Among those phylogenetically inform- 
ative positions, 1,889 non-N positions were retrieved from MA-1. 

Principal component analysis. A single read was sampled from each position in 
the MA-1 data set, which overlapped with SNPs in a data set compiled from a 
previous paper” in which the authors had used local ancestry inference to mask 
segments of European and African ancestry in Siberian and Native American 
populations** (Supplementary Information, section 10). A phred-scaled map- 
ping quality of 30 and base quality score of 30 was required in the sequence data for 
a haploid genotype to be called, and reads with indels were excluded. SNPs with 
minor allele frequency of <1% in the total data set were removed. To reduce the 
effect of nucleotide misincorporations, the first and last three bases of each 
sequence read in the MA-1 data were excluded. SNPs where there was no informa- 
tion from MA-1 were excluded, and a single haploid genotype was randomly 
sampled from each modern individual to match the single-pass nature of the 
shotgun data®’. PCA was performed on various population subsets separately 
using EIGENSOFT 4.0 (ref. 61), removing one SNP from each pair for which 
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linkage disequilibrium exceeded a low arbitrary threshold (77 > 0.2). Transition 
SNPs, where the ancient individual displayed a T or an A®, as well as triallelic 
SNPs, were excluded. 

To look more closely at the genetic affinities of AG-2 to modern-day popula- 

tions, data from non-African populations” were used as a reference panel and 
PCA was performed as detailed above (Supplementary Information, section 15). 
To compare the PCA results from MA-1 and AG-1, Procrustes transformation 
was performed as described in a previous paper”, rotating the PC1-PC2 config- 
urations obtained for the two individuals to the configuration obtained using only 
the reference panel (Supplementary Information, section 15). The analysis was 
repeated using only those sequences which displayed a C > T mismatch consist- 
ent with post-mortem ancient DNA nucleotide misincorporations (PMD) in the 
first five bases of the sequence read (requiring a base quality of at least 30) (Sup- 
plementary Information, section 15). 
Admixture graph inference. To infer admixture graphs, a total of 17 individuals 
were used: the archaic Denisova genome”; 11 present-day individuals”; the 4 
novel genomes from this study (Supplementary Information, section 4.2); and 
the MA-1 genome (Supplementary Information, section 11). Haploid genotypes 
from MA-1 were added to variants identified in the other individuals, as in the 
PCA analysis to alleviate the increased rate of errors in low-coverage ancient DNA 
sequence data. If multiple sequence reads overlapped a position, one read was 
randomly sampled”’. This avoids biasing for, or against, heterozygotes and renders 
the MA-1 data haploid. All transition SNPs were excluded and MA-1 sequence 
reads with a mapping quality less than 30 and bases with base quality less than 30 
were discarded. Positions at which there was no data from one of the individuals in 
the analysis were also excluded. This resulted in a final count of 156,250 SNPs for 
the main analysis. TreeMix*! (version 1.12) was used to build ancestry graphs 
assuming 0 to 10 migration edges, the placement and weight of each being opti- 
mized by the algorithm. TreeMix was run using the ‘-global’ option, which corre- 
sponds to performing a round of global rearrangements of the graph after initial 
fitting. Sample size correction was also disabled, as all the populations consisted of 
single individuals (‘-noss’). Standard errors were estimated in blocks with 500 
SNPs in each. For those analyses that included one or more a priori specified 
events, a round of optimization was performed on the original migration edge 
(option ‘-climb’). 

Admixture graphs relating MA-1 to modern groups were also inferred using 
MixMapper v1.0 (ref. 27) (Supplementary Information, section 12). A scaffold tree 
was constructed using four African genomes (San, Yoruba, Mandenka, Dinka), 
and Sardinian and Han” genomes, to which MA-1 and other genomes were fitted. 
All transitions were excluded, and standard errors of the f-statistics were estimated 
using 500 bootstrap replicates over 50 blocks of the autosomal genome. 
D-statistics. To investigate the relationship between MA-1 and a number of 
modern populations, a sequence read-based D-statistic test (ABBA-BABA test’), 
equivalent to previously published tests’, was applied to sequencing data from a 
single genome from each of the populations of interest (Supplementary Information, 
section 13). MA-1 and 11 high-coverage present-day genomes were included in 
this test. For the chimpanzee outgroup, the multiway alignment, which includes 
both chimpanzee and human (pantro2 from the hg19 multiz46), was used. The 
data were filtered as follows before calculating the sequence read-based D-statistic”’. 
First, all reads with mapping quality below 30 were removed. Subsequently, bases 
of low quality were removed by dividing all bases into eight base categories: A, C, 
G, T on the plus strand and A, C, G, T on the minus strand. The lowest-scoring 
50% of bases from each of the eight categories were then discarded. More specif- 
ically, within each base category, we found the highest base quality score, Q, for 
which less than half of the bases in the base category had a quality score smaller 
than Q. We then removed all bases with quality score smaller than Q, and ran- 
domly sampled and removed bases with quality score equal to Q until 50% of the 
bases from the base category had been removed in total. The data were filtered 
separately for each of the eight base categories to avoid bias in the test in case of 
significant difference in the base quality between the categories. After filtering, a 
single base was sampled at each site for each individual in order to avoid intro- 
ducing bias due to differences in sequencing depth. Finally, all sites containing 
transitions were removed. Based on the filtered data, D-statistics were calculated 
and to assess if these were significantly different from 0, standard errors and Z 
scores were obtained using a method known as ‘delete-m Jackknife for unequal m’, 
with a block size of 5 megabases®. 

For genotype data from SNP arrays we computed an allele frequency-based 
D-statistic test, which is a generalization of the sequence read-based test (Supplemen- 
tary Information, section 14.3). We used previously presented estimators*®*, 
obtaining standard errors using a block jackknife procedure over 5-megabase 
blocks in the genome, except for the tests with the Tianyuan data (chromosome 
21), in which case we used 100-kb blocks to increase power. Two main data sets 
were used: first, a published SNP data set (364,470 SNPs) masked for European and 
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African ancestries in Siberian and Native American populations’’, which was merged 
with additional data from Finnish populations™; and second, SNPs ascertained in 
San and Yoruban individuals and typed in worldwide populations”’. As the San 
and Yoruba populations are approximate outgroups to non-African populations, 
this data are unbiased for all comparisons between non-Africans. Transition SNPs 
were included but the first and last three bases of each sequence read were excluded 
since the majority of nucleotide misincorporations occur at the ends of ancient 
DNA templates (Supplementary Information, section 6.2). For other tests, (in Sup- 
plementary Information, section 14), SNP data described in Supplementary Table 
11 were used. We sampled a single read at each position from the MA-1 data as in 
the principal component analysis. 

Outgroup f,-statistics. Classical measures of pairwise genetic distance, such as 
Wright’s fixation index Fey, are sensitive to genetic drift that has occurred since the 
divergence of the two test populations. If such lineage-specific genetic drift differs 
between populations that share an equal amount of genetic history with an ancient 
individual, the ancient individual would be observed as being closer to the modern 
populations with the least degree of historical genetic drift using distance-based 
methods such at Fey. To circumvent these issues and obtain a statistic that is infor- 
mative of the genetic relatedness between a particular sample and each candidate 
population in a reference set, an ‘outgroup f;-statistic’ was computed (Supplemen- 
tary Information, section 14.2). The expected value of the f5-statistic”, f,(Outgroup; 
A, B), equals the sum of expected squared change in allele frequency (normalized 
for heterozygosity in the outgroup) due to genetic drift on the path in the popu- 
lation tree from the outgroup to the root and from the root to the ancestor of 
populations A and B. As genetic drift in the lineage specific to the outgroup is 
expected to be constant regardless of which populations A and B are used (in the 
absence of gene flow), the remaining variation between statistics will depend on 
how much genetic history is shared between populations A and B. We used Yoruba 
as an outgroup to non-African populations and computed the statistic f,(Yoruba; 
MA-1, X) to investigate the shared history of MA-1 and a set of 147 worldwide 
candidate populations (as X) obtained by merging several data sets (Supplementary 
Figs 21 and 22), and we corroborated major patterns using SNPs from a San 
individual from southern Africa (Supplementary Information, section 14). 
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Prefrontal parvalbumin interneurons shape neuronal 
activity to drive fear expression 


Julien Courtin!”, Fabrice Chaudun’”, Robert R. Rozeske’*, Nikolaos Karalis’?, Cecilia Gonzalez-Campo'”, Heéléne Wurtz!?, 
Azzedine Abdi**, Jerome Baufreton**, Thomas C. M. Bienvenu!” & Cyril Herry'? 


Synchronization of spiking activity in neuronal networks is a fun- 
damental process that enables the precise transmission of informa- 
tion to drive behavioural responses’ ’. In cortical areas, synchronization 
of principal-neuron spiking activity is an effective mechanism for 
information coding that is regulated by GABA (y-aminobutyric 
acid)-ergic interneurons through the generation of neuronal oscil- 
lations**. Although neuronal synchrony has been demonstrated to 
be crucial for sensory, motor and cognitive processing*®*, it has not 
been investigated at the level of defined circuits involved in the 
control of emotional behaviour. Converging evidence indicates that 
fear behaviour is regulated by the dorsomedial prefrontal cortex” 
(dmPFC). This control over fear behaviour relies on the activation 
of specific prefrontal projections to the basolateral complex of the 
amygdala (BLA), a structure that encodes associative fear mem- 
ories’*"'°, However, it remains to be established how the precise 
temporal control of fear behaviour is achieved at the level of pre- 
frontal circuits. Here we use single-unit recordings and optogenetic 
manipulations in behaving mice to show that fear expression is 
causally related to the phasic inhibition of prefrontal parvalbumin 
interneurons (PVINs). Inhibition of PVIN activity disinhibits pre- 
frontal projection neurons and synchronizes their firing by reset- 
ting local theta oscillations, leading to fear expression. Our results 
identify two complementary neuronal mechanisms mediated by 
PVINs that precisely coordinate and enhance the neuronal activity 
of prefrontal projection neurons to drive fear expression. 

To identify the prefrontal circuitry involved in conditioned fear 
behaviour, mice were implanted with recording electrodes aimed at 
the dmPFC, and submitted to auditory fear conditioning, a robust 
learning paradigm in which animals learn to associate a neutral stimu- 
lus (the conditioned stimulus, CS) with a coincident aversive foot-shock 
(the unconditioned stimulus, US) (Fig. 1a). Re-exposure to the CS 
induces the expression of various conditioned fear responses, includ- 
ing an immobilization reaction called freezing. Twenty-four hours 
after conditioning, mice displayed a selective increase in freezing dur- 
ing presentations of the CS associated with the US (CS*), which 
returned to baseline levels by the end of the second extinction session 
(Fig. 1b). One week later, cst presentations induced a selective fear 
recovery (Fig. 1b). Among the 732 neurons recorded in dmPFC, 493 
(67.3%) displayed significant excitatory or inhibitory phasic responses 
to CS* presentations following conditioning. To dissect dmPFC cir- 
cuits involved in the control of fear behaviour, we separated the cs*- 
responsive neurons into putative principal neurons (PNs, n = 351) and 
interneurons (INs, n = 142) using unsupervised clustering and cross- 
correlogram analyses (Extended Data Fig. 1). Among dmPFC INs, prin- 
cipal component analyses revealed two main subclasses with opposite 
CS-evoked responses during fear expression (Fig. 1c, d, and Extended 
Data Fig. 2a, b). Type 1 INs (n = 68) displayed short-latency, CS- 
evoked activity correlated with high (CS*), but not low (CS"), fear 
states. Conversely, type 2 INs (n = 15) were strongly inhibited during 
high but not low fear states (Fig. 1c, d). Correlation analyses carried out 


between changes in activity after CS presentations and freezing levels 
revealed that the firing of type 1 and type 2 INs were correlated and 
inversely correlated, respectively, with freezing (Fig. le, f). Moreover, 
latency and cross-correlation analyses of simultaneously recorded cells 
revealed that CS" -evoked excitation of type 1 INs preceded type 2 INs 
CS*-evoked inhibition (Extended Data Fig. 2c-e). 

Interestingly, whereas type 1 INs displayed moderate firing rates 
(16.2 + 1.5 Hz) and were weakly modulated with local theta oscilla- 
tions, type 2 INs showed fast firing activity (43.9 + 9.7 Hz) and were 
strongly modulated with local theta, suggesting that type 2 INs are 
PVINs'* (Extended Data Fig. 2 f-h). To address this possibility, we selec- 
tively infected PVINs with injections of a conditional adeno-associated 
virus (AAV) encoding for archaeorhodopsin in the dmPFC of mice 
expressing the Cre recombinase under the control of a PV promoter 
(PV-IRES-Cre; Fig. 2a and Extended Data Fig. 3a, b). Using this strat- 
egy, we optically silenced the firing of type 2 (n = 5/5 (5 out of 5)) but 
not type 1 INs (n = 0/9), indicating that type 2 INs belong to the PVIN 
population (Fig. 2b). Remarkably, among light-reactive PVINs (n = 9), 
only type 2 PVINs (n=5) displayed significant decreases in CS- 
evoked activity following conditioning, suggesting a functional role 
of this subpopulation during fear behaviour (Extended Data Fig. 4a-— 
d). In summary, we identified two subclasses of dmPFC INs whose 
activities oppositely correlate with fear behaviour and demonstrated 
that type 2 INs are PVINs. 

To determine whether the CS-evoked inhibition of type 2 PVINs causes 
fear expression, PV-IRES-Cre mice received intra-dmPFC injections 
of a conditional AAV encoding for archaeorhodopsin or channelrho- 
dopsin. Infection of dmPFC PVINs did not change their electrophysio- 
logical characteristics (Extended Data Fig. 3c-e). Before fear conditioning, 
optical silencing of PVINs induced freezing (Fig. 2c). Moreover, after 
fear extinction, CS“ presentations coupled to optical silencing of PVINs, 
including type 2 INs, consistently reinstated fear responses (Fig. 2c and 
Extended Data Fig. 4c-e). Conversely, optical activation of PVINs tran- 
siently inhibited freezing (Fig. 2d). To control that freezing induced by 
CS-evoked inhibition of type 2 INs did not result from motor impair- 
ments, we optically inhibited PVINs during a place avoidance para- 
digm, in which mice could actively avoid the compartment in which 
they received optical silencing. Under these conditions, optogenetic 
silencing of PVINs produced place aversion relative to control animals 
(Extended Data Fig. 5). These data demonstrate that fear expression is 
causally related to the inhibition of dmPFC PVINs, including type 2 INs. 

PVINs target the perisomatic region of PNs, thereby providing 
powerful inhibition of dmPFC output activity!’. Therefore, CS*- 
evoked inhibition of PVINs during fear behaviour might disinhibit 
PNs, a permissive mechanism that would gate fear responses. Consistent 
with this, the vast majority of tone-reactive PNs (n = 308/351, 87.7%) 
significantly increased their activity upon CS* relative to CS” presen- 
tations (Fig. 3a). Moreover, the optogenetic activation of PVINs inhibited 
PNs, prevented CS* -induced activation of PNs and reduced freezing 
(Extended Data Fig. 6a-c). Conversely, light-induced inhibition of 


INSERM, Neurocentre Magendie, U862, 146 Rue Léo-Saignat, Bordeaux 33077, France. University of Bordeaux, Neurocentre Magendie, U862, 146 Rue Léo-Saignat, Bordeaux 33077, France. University 
of Bordeaux, Institut des Maladies Neurodégénératives, UMR 5293, Bordeaux F-33000, France. 4CNRS, Institut des Maladies Neurodégénératives, UMR 5293, Bordeaux F-33000, France. 


92 | NATURE | VOL 505 | 2 JANUARY 2014 


©2014 Macmillan Publishers Limited. All rights reserved 


a Conditioning context Extinction context 
Day 1: Hab. Day 1: FC Day 2: Post FC Day 3: Ext. Day 10: Ret. 
cs* 4CS 5 CS-US 12CS 12CS 4CcS 
cs- 4CS 5CS 4CS 4CS 4CcS 
b Day 1: Hab. Day 2: Post FC Day 3: Ext. Day 10: Ret. 
100 
Hics- 
80 
& ics: 
D 
& 
N 
oO 
2 


Blocks of 4 CS 


Cc Tone e 
Bi 1 iD 1 
ae 1004 %°° 2.0 
S 
447=68 _ & 804,68 16 
aCS D e 
g £ 60 
5 acSt Ww 12° 6N 
3 3? 3 8 
x= N 40 08 8 
s O Pipette ° 
2 oe 0.4 
25 T T T T 1} 40.0 
-400 -200 0 200 400 -400 -200 0 200 400 cs- cst 
Time (ms) Time (ms) 
d , Jone 
100 0.0 
= 80 -0.5 5 
2. > 60 1.0 x 
_ 9-2 nN = a 
s sO ners 8 40416 15 3 
= acs ££ @ 
3 mcst @ 20 -2.0 
rm 
T 1 i?) 2.5 
-400 -200 0 200 400 are ~200 0 200 400 cs cst 
Time (ms) Time (ms) 


Figure 1 | Firing of distinct dmPFC INs oppositely correlates with fear 
expression. a, Protocol. b, During habituation (Hab.), mice (n = 29) exhibited 
low freezing during CS” and CS”. After fear conditioning (Post FC; the first 
extinction session), CS* (CS presentations 1-12, grouped into blocks of 4) 
induced high freezing (Wilcoxon signed-rank tests, CS” versus each CS™ block; 
all P< 0.001). After extinction (Ext. (the second extinction session), n = 28 
mice), cst (cs* 9-12; the last 4 CS* of the extinction session) and CS” evoked 
low freezing. During retrieval (Ret.), CS* but not CS” induced fear recovery 
(n = 21 mice, Wilcoxon signed-rank test, CS versus CS*; P< 0.001). Error 
bars, mean + s.e.m. ¢, d, Left, raster plots and peristimulus time histograms 
(PSTHs) of CS* -evoked firing for INs (type 1 and 2) during Post FC (CS* 1-4, 
108 trials). Right, mean z score of CS” and CS* -evoked responses of type 1 and 
type 2 INs during Post-FC, Ext. or Ret. sessions (CS and CS* 1-4, 108 trials). 
Type 1 INs were excited (n = 68, 25 mice, paired t-test, CS” versus cst, 
P<0.001), whereas type 2 INs were inhibited during CS* (n= 15, 8 mice, 
paired t-test, CS versus cst, P< 0.001). Bins of 10 ms. e, f, Correlations 
between freezing during CS (Post-FC, Ext. or Ret. sessions, CS~ and CS* 1-4) 
and CS-evoked firing (mean z score 0-150 ms post CS) for type 1 INs (n = 68, 
Pearson’s r = 0.79, P< 0.01) and type 2 (n = 15, Pearson’s r= —0.93, 
P<0.001). 


PVINs disinhibited PNs and produced freezing (Figs 2c and 3b). These 
data suggest that the increased activity of dmPFC PNs during fear 
expression results from a disinhibitory mechanism mediated by PVINs. 

As PVINs have a key role in the genesis of cortical networks oscilla- 
tions'*"°, we investigated whether specific changes in dmPFC local field 
potentials (LFPs) were associated with different fear states. Although 
freezing periods were associated with a strong reduction of LFP theta 
power compared to non-freezing periods (Extended Data Fig. 7a), CS* 
but not CS” presentations were associated with a transient amplitude 
increase and a phase resetting of theta oscillations (Fig. 4a and Extended 
Data Fig. 7b). This analysis produced similar results when restricted to 
freezing and non-freezing periods during CS presentations (Extended 
Data Fig. 7c, d). This observation raises the question of whether 
dmPFC theta phase resetting during fear behaviour is mediated locally 
or imposed by a remote structure, such as the hippocampus. To address 
this question we locally injected muscimol to inactivate the medial 
septum, a structure that is involved in the genesis of hippocampal theta 
oscillations”’. Inactivation of the medial septum reduced hippocampal 
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Figure 2 | Prefrontal type 2 PVINs control fear expression. a, Schematic of 
light inhibition of archaeorhodopsin (ArchT)-green fluorescent protein (GFP)- 
expressing PVINs (green) in dmPFC with an optic fibre coupled to the 
recording electrodes (yellow). Cg1, anterior cingulate cortex; IL, infralimbic 
area; PL, prelimbic area. b, PSTHs showing mean activity changes for type 1 
(left, n = 9) and type 2 INs (right, n = 5) upon yellow light (yellow bars, 250 ms; 
108 trials, 0.9 Hz). A Fisher exact statistical test revealed that the proportions of 
the two populations were significantly different (P = 0.033). Bins of 10 ms. 

c, Protocol (left panel) and behaviour from PV-IRES-Cre mice infected in 
dmPFC with GFP (control, n = 8) or ArchT-GFP-expressing (n = 9) floxed 
AAV viruses and submitted to yellow light. Before conditioning (Pre FC, 
middle panel) and after extinction (right panel), optogenetic inhibition of 
PVINs induced freezing (paired t-tests, Pre FC, GFP versus ArchT, 

** P< (0.001; Extinction, GFP versus ArchT, ***P < 0.001; light-pulse 
duration, 250 ms; 108 trials, 0.9 Hz). d, Protocol (left panel) and behaviour from 
PV-IRES-Cre mice infected with control GFP (n = 8) or channelrhodopsin 
(ChR2)-enhanced yellow fluorescent protein (eYFP)-expressing (n = 6) floxed 
AAV viruses in the dmPFC and submitted to blue light. Following conditioning 
(middle panel, Post FC), optogenetic activation of PVINs decreased freezing 
(Post FC, GFP versus ChR2, paired t-test, **P < 0.01; light-pulse duration, 
250 ms; 108 trials, 0.9 Hz). NS, not significant. Error bars, mean + s.e.m. 


theta power, whereas it did not influence freezing and had no effect on 
dmPFC theta phase resetting evoked by CS* presentations (Extended 
Data Fig. 8). 

Interestingly, we observed a strong correlation between CS * -evoked 
inhibition of PVINs and dmPFC theta phase resetting, suggesting that 
this phenomenon is gated by PVINs (Fig. 4b). In support of this 
hypothesis, optogenetic inhibition of PVINs reproduced theta phase 
resetting (Fig. 4c and Extended Data Fig. 9). Consistent with this, 
dmPFC theta resetting induced by CS* was blocked by optogenetic 
excitation of PVINs (Fig. 4d). Our results indicate that CS*-evoked 
inhibition of PVINs mediates theta phase resetting during fear 
expression, a phenomenon that might enhance synchronization and 
efficiency of dmPFC output neurons. To evaluate whether dmPFC 
theta phase resetting is associated with spiking synchronization among 
PNs during fear expression, we quantified the number of PNs display- 
ing a significant firing increase during CS” and CS" presentations. 
Significantly more PNs were activated during CS* relative to CS” 
presentations (Fig. 5a). This activation was associated with a signifi- 
cant increase of coincident firing between pairs of PNs following CS* 
(Fig. 5b and Extended Data Fig. 10a). Furthermore, more PNs were 
significantly phase-locked to local theta oscillations during CS™ rela- 
tive to CS presentations (Fig. 5c). Consistent with this, comparison of 
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Figure 3 | dmPFC PVINs disinhibit PNs during fear expression. a, Raster 
plots and PSTHs illustrating CS~ and CS*-evoked activities (left and middle 
panels, respectively) of a PN during Post FC (CS /CS* 1-4, 108 trials). Right 
panel, mean PSTHs of PNs recorded during Post FC, Ext. or Ret. (n = 308 
neurons from 27 mice, CS” and CS* 1-4) showing a stronger and significant 
increase in response to CS* compared to CS”. b, Firing of a PN recorded ina 
mouse expressing ArchT in dmPFC PVINs at baseline (left panel, no light), 
and in response to yellow light (middle panel, light-pulse duration, 250 ms; 
108 trials, 0.9 Hz). Right panel, mean PSTHs of all PNs displaying significant 
CS*-evoked excitation during Ext. and disinhibited during optogenetic 
inhibition of PVINs (” = 27/41 PNs from 7 mice; light-pulse duration, 250 ms; 
108 trials, 0.9 Hz). Bins of 20 ms. 


the strength of theta phase locking, a measure of spiking synchroniza- 
tion, revealed a stronger tuning of dmPFC activity to local theta during 
CS* periods (Extended Data Fig. 10b). To evaluate whether enhance- 
ment of the spiking synchronization of PNs with local theta induced by 
cs* presentations was causally related to the inhibition of PVINs, we 
optogenetically manipulated PVINs and quantified PN theta phase 
locking. Our analysis revealed that light-induced inhibition of PVINs 
increased, whereas light-induced excitation of PVINs reduced PNs 
phase locking to dmPFC theta oscillations (Extended Data Fig. 10c, d). 
To understand the dynamics of PNs synchronization during theta 
phase reset, the mean preferred phase of individual PNs was calcula- 
ted during the first three theta cycles following CS* (Supplementary 
Methods). Relative to CS” presentations, CS* -induced firing of PNs 
occurred significantly more frequently around the peak of the oscilla- 
tions, thereby creating precise temporal windows during which PNs 
were synchronized (Fig. 5d). Interestingly, similar to the CS* condition, 
artificial resetting of local theta oscillations, either by aligning the phase 
of individual LFPs during CS” presentations or by optogenetically 
inhibiting PVINs, produced synchronization of PNs firing around 
the peak of theta oscillations (Extended Data Fig. 10e, f). This obser- 
vation suggests that the overall phase preference of PNs did not change 
between CS” or CS* conditions, but that PV-mediated theta phase 
resetting coordinated and sharpened synchronization among PNs. 
Converging evidence indicates that dmPFC PNs target both the 
basolateral amygdala (BLA) and the periaqueductal grey (PAG), two 
structures involved in fear behaviour*’”’. This raises the possibility 
that PNs may modulate fear expression through direct projections to 
the PAG and/or the BLA. To disentangle these possibilities, we anti- 
dromically activated dmPFC efferents using extracellular stimulation 
of BLA or PAG in anaesthetized mice, following completion of beha- 
viour. These experiments revealed that PNs disinhibited during CST 
presentations preferentially targeted the BLA (Fig. 5 e, fand Extended 
Data Fig. 6d). These data indicate that theta phase resetting mediated 
by PVINs synchronizes PNs after CS* presentations and suggest that 
dmPFC PNs preferentially target the BLA to drive fear responses. 
Using single-unit and LFP recordings combined with optogenetic 
manipulation of PVINs in mice, we have shown that a subpopulation 
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Figure 4 | Inhibition of PVINs induces theta phase resetting. a, dmPFC LFP 
trace amplitudes (Amp.) filtered in the 8-12-Hz range (left panel, 27 sweeps) 
and corresponding standardized trace amplitudes (middle panel) illustrating 
theta phase resetting induced by CS* but not CS” (Post-FC session, first CS~ 
and first CS‘, 27 sweeps). Right panel, time variance of the first theta peak 
following CS (Post FC, CS” and CS* 1-4, n = 28 mice, CS” versus CS*, paired 
t-test, ***P < 0.001). b, Correlation between CS-evoked firing of type 2 INs 
(mean z score 0-150 ms post CS) and the time variance of the first theta peak 
following CS (Post FC, CS” and CS* 1-4,n = 15 type 2 INs, Pearson’s r = 0.75, 
P<0.05). ¢, Left panel, dmPFC LFP traces recorded in a mouse expressing 
ArchT in PVINs in control conditions (top part; No light, 27 sweeps), and 
during optogenetic inhibition of PVINs (bottom part; light duration, 250 ms; 27 
sweeps, 0.9 Hz). Right panel, time variance of the first theta peak following light 
(n = 9 mice, No light versus Light, paired t-test, *P < 0.05). d, Representative 
LFP traces recorded when CS* was presented alone (left panel, Post-FC session, 
first CS*) or paired with the optogenetic activation of PVINs (middle panel, 
Post-FC session, fifth CS*; light-pulse duration, 250 ms; 27 sweeps, 0.9 Hz). 
Right panel, time variance analyses of theta resetting during CS* or CS*/Light 
(n = 6 mice, CS* versus CS*/Light, paired t-test, **P < 0.01). Error bars, 
mean + s.e.m. 


of PVINs organizes the spiking activity of dmPFC PNs during precise 
time windows, through phase resetting of local theta oscillations, to 
drive fear expression. Our data indicate that the fine regulation of 
dmPFC-BLA PNs by a subtype of PVINs is critical for the expression 
of fear behaviour. Our results demonstrate that inhibition of type 2 
PVINs during CS” presentations is causally related to the expression 
of conditioned fear responses, and suggest that type 1 INs might inhibit 
type 2 INs. The origin of CS-mediated type 1 INs excitatory responses 
remains to be determined, but it is likely that they receive inputs from 
structures involved in the encoding or modulation of conditioned fear 
such as BLA or hippocampus**”’. 

A key question is what mechanisms can account for our observation 
that inhibition of PVINs is necessary and sufficient for the expression 
of fear responses. Cortical PVINs are known to inhibit PNs through 
powerful perisomatic inhibition!”. As a consequence, CS*-evoked 
inhibition in PVINs induced a strong disinhibition of PNs, a permiss- 
ive mechanism that gated neuronal responses during fear expression. 
These results indicate that CS-evoked activity in dmPFC PNs during 
fear expression result in part from a disinhibitory mechanism. Notably, 
conditioned freezing was not entirely prevented by PVINs activation, 
indicating that some dmPFC PNs may escape inhibitory control, or 
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Figure 5 | Synchronization of dmPFC PNs during fear expression. 

a, Number of significantly CS-activated PNs recorded during Post-FC, Ext. or 
Ret. sessions (CS, n = 205; CS* 1-4, n = 308; CS” versus CS*, paired t-test, 
P<0.001). b, Normalized averaged ratio of changes in coincident activity 
between pairs of PNs induced by CS” and CS* (Post-FC, Ext. or Ret. sessions, 
n= 975 pairs from 308 PNs). Dashed line, significant z score at P < 0.05 level. 
Bins of 30 ms. c, Cumulative distribution of log-transformed Rayleigh’s test Z of 
CS-responsive PNs (n = 308; Post-FC, Ext. or Ret. sessions). Dashed line, 
significant theta phase locking threshold (In (Z) = 1.1, P< 0.05; CS ,n = 24 
neurons; CS* (1-4), 1 = 65 neurons). d, Top panel, preferred theta phase 
distributions of PNs (nm = 308 neurons, 108 CS pips, Post-FC, Ext. or Ret. 
sessions) during theta cycles around CS" (left part, blue bars, 14.7% freezing) 
and CS* (right part, red bars, 69.8% freezing, bins of 45°). Bottom panel, 
preferred theta phases of individual PNs. Example 8-12-Hz theta-filtered LFP 
traces during CS” and CS* are represented above for illustrative purposes. 
During CS* but not CS", resetting of theta oscillations synchronizes the firing 
of PNs around the peaks of theta cycles (Rayleigh test for circular uniformity: 
first theta cycle post CS, CS* versus CS, P< 0.001). e, Left panel, strategy used 
to identify connections between PNs and the BLA-PAG. Rec., recording 
electrode; S, stimulation artifact; Stim., stimulation electrode. Right panel, 
antidromic spikes recorded from a PN in response to BLA stimulations 
identified by their low temporal jitter (top trace, 10 trials), collisions with 
spontaneously (Spont.) occurring spikes (middle trace, 10 trials) and ability to 
follow high-frequency stimulation (bottom trace, 250 Hz, 10 trials). f, PNs 
exhibiting antidromic responses to BLA stimulations displayed CS*- evoked 
excitation (9/9 neurons). Only a small fraction of PNs exhibiting antidromic 
responses to PAG stimulation displayed CS* -evoked excitation (1/7 neurons, 
14.3%). Thin arrow indicates that fewer neurons that project to the PAG 

are disinhibited; thick arrow indicates that more neurons that project to the 
BLA are disinhibited. 


that other brain regions promote fear responses in concert with 
dmPFC. 

Although fear behaviour was associated with a reduction in dmPFC 
theta-oscillation magnitude, CS* -evoked inhibition of PVINs induced 
a robust and transient theta phase resetting spanning two to three theta 
cycles. Transient theta phase has been previously observed in cortical 
regions following electrical or sensory stimulations****. Our findings 
provide the first mechanistic explanation of phase resetting at the 
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cellular level and extend this phenomenon to the control of emotional 
behaviour. Functionally, we observed that theta phase resetting syn- 
chronized PNs around theta peaks without changing the preferred 
phases of individual PNs. This observation suggests that resetting of 
local theta oscillations, but not the preferred phases of individual PNs 
to the local LFP, is critically involved in the expression of fear res- 
ponses. Thus, theta phase resetting represents a powerful mechanism 
for reliable fear expression because it creates an optimal temporal 
relationship that binds spiking activity with sensory information pro- 
vided by CS. Ultimately, phase resetting of oscillations is a powerful 
mechanism that enhances the impact of input signals and enables 
transmission of information to downstream targets. Our data also 
show that reduction of rhythmic inhibition from PVINs paradoxically 
increases synchrony. Suppression of interference between two oscilla- 
tors may account for this effect. Future work will be needed to identify 
the origin of dmPFC theta oscillations that are unmasked by PVINs 
inhibition. 

Another question is how synchronized PNs can control fear expres- 
sion. Previous findings suggest that putative dmPFC PNs displaying 
sustained or transient changes in their spiking activity promote fear 
expression through activation and synchronization of BLA neurons’”””*. 
In line with these studies, our results demonstrate that PNs exhibiting 
CS*-evoked synchronized firing during fear expression preferentially 
project to the BLA where they may target specific neuronal populations 
activated during fear behaviour”. 

Finally, our findings suggest that persistent fear behaviour, which is 
at the core of psychiatric conditions such as anxiety disorders, may be 
finely regulated at the level of specific prefrontal inhibitory circuits. 


METHODS SUMMARY 


Mice were submitted to a fear-conditioning paradigm in which the CS* but not 
the CS” was paired with a mild foot-shock (US). Extinction training was carried 
out over 2 days and mice were tested 1 week later for a retrieval session”*. For opto- 
genetic manipulations, PV-IRES-Cre mice received stereotaxic injections of AAV 
viruses encoding channelrhodopsin or archaeorhodopsin in the dmPFC. Bilateral 
activation of archaeorhodopsin or channelrhodopsin was performed using 
implanted optic fibres coupled to a laser beam. Inactivation of the medial septum 
was achieved using local pressure injection of fluorescently labelled muscimol. 
Individual neurons were recorded extracellularly and spikes were sorted by time- 
amplitude window discrimination and template matching as described”. CS- 
evoked responses were normalized to baseline activity using a z-score transforma- 
tion. Antidromic and orthodromic spikes evoked by extracellular stimulations of 
the BLA or PAG were recorded in neurons isolated from behavioural sessions and 
recorded in urethane-anaesthetized mice, after completion of behaviour. In vitro 
whole-cell voltage and current-clamp recordings were performed using glass pipe- 
ttes (4-6 MQ) filled with K-gluconate-based solutions. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 


Received 15 May; accepted 9 October 2013. 
Published online 20 November 2013. 


1. Singer, W. Neuronal synchrony: a versatile code for the definition of relations? 
Neuron 24, 49-65, 111-125 (1999). 

2. Buzsaki, G. & Draguhn, A. Neuronal oscillations in cortical networks. Science 304, 
1926-1929 (2004). 

3. Womelsdorf, T. et a/. Modulation of neuronal interactions through neuronal 
synchronization. Science 316, 1609-1612 (2007). 

4. Royer, S. et al. Control of timing, rate and bursts of hippocampal place cells by 
dendritic and somatic inhibition. Nature Neurosci. 15, 769-775 (2012). 

5. Cobb, S. R., Buhl, E. H., Halasy, K., Paulsen, O. & Somogyi, P. Synchronization of 
neuronal activity in hippocampus by individual GABAergic interneurons. Nature 
378, 75-78 (1995). 

6. Benchenane, K. et al. Coherent theta oscillations and reorganization of spike 
timing in the hippocampal- prefrontal network upon learning. Neuron 66, 
921-936 (2010). 

7. Friedrich, R. W., Habermann, C. J. & Laurent, G. Multiplexing using synchrony in the 
zebrafish olfactory bulb. Nature Neurosci. 7, 862-871 (2004). 

8. Riehle, A., Grun, S., Diesmann, M. & Aertsen, A. Spike synchronization and rate 
modulation differentially involved in motor cortical function. Science 278, 
1950-1953 (1997). 


2 JANUARY 2014 | VOL 505 | NATURE | 95 


©2014 Macmillan Publishers Limited. All rights reserved 


LETTER 


23. 
24. 


Burgos-Robles, A., Vidal-Gonzalez, |. & Quirk, G. J. Sustained conditioned 
responses in prelimbic prefrontal neurons are correlated with fear expression and 
extinction failure. J. Neurosci. 29, 8474-8482 (2009). 


. Tang,J. etal. Pavlovian fear memory induced by activation in the anterior cingulate 


cortex. Mol. Pain 1, 6 (2005). 


. Vidal-Gonzalez, |., Vidal-Gonzalez, B., Rauch, S. L. & Quirk, G. J. Microstimulation 


reveals opposing influences of prelimbic and infralimbic cortex on the expression 
of conditioned fear. Learn. Mem. 13, 728-733 (2006). 


. Corcoran, K. A. & Quirk, G. J. Activity in prelimbic cortex is necessary for the 


expression of learned, but not innate, fears. J. Neurosci. 27, 840-844 (2007). 


. Pape, H.C. & Pare, D. Plastic synaptic networks of the amygdala for the acquisition, 


expression, and extinction of conditioned fear. Physiol. Rev. 90, 419-463 (2010). 


. Knapska, E. et a/. Functional anatomy of neural circuits regulating fear and 


extinction. Proc. Natl Acad. Sci. USA 109, 17093-17098 (2012). 


. LeDoux,J.E. Emotion circuits in the brain. Annu. Rev. Neurosci. 23, 155-184 (2000). 
. Hartwich, K., Pollak, T. & Klausberger, T. Distinct firing patterns of identified basket 


and dendrite-targeting interneurons in the prefrontal cortex during hippocampal 
theta and local spindle oscillations. J. Neurosci. 29, 9563-9574 (2009). 


. Freund, T. F. & Katona, |. Perisomatic inhibition. Neuron 56, 33-42 (2007). 
. Blatow, 


. et al. A novel network of multipolar bursting interneurons generates 
theta frequency oscillations in neocortex. Neuron 38, 805-817 (2003). 


. Losonczy, A. Zemelman, B. V., Vaziri, A. & Magee, J. C. Network mechanisms of 


theta related neuronal activity in hippocampal CA1 pyramidal neurons. Nature 
Neurosci. 13, 967-972 (2010). 


. Yoder, R.M. & Pang, K. C. Involvement of GABAergic and cholinergic medial septal 


neurons in hippocampal theta rhythm. Hippocampus 15, 381-392 (2005). 


. Gabbott, P. L., Warner, T.A., Jays, P. R., Salway, P. & Busby, S. J. Prefrontal cortex in 


the rat: projections to subcortical autonomic, motor, and limbic centers. J. Comp. 
Neurol. 492, 145-177 (2005). 


. Di Scala, G., Mana, M. J., Jacobs, W. J. & Phillips, A. G. Evidence of Pavlovian 


conditioned fear following electrical stimulation of the periaqueductal grey in the 
rat. Physiol. Behav. 40, 55-63 (1987). 

Herry, C. et al. Switching on and off fear by distinct neuronal circuits. Nature 454, 
600-606 (2008). 

Sotres-Bayon, F., Sierra-Mercado, D., Pardilla-Delgado, E. & Quirk, G. J. Gating of 
fear in prelimbic cortex by hippocampal and amygdala inputs. Neuron 76, 
804-812 (2012). 


96 | NATURE | VOL 505 | 2 JANUARY 2014 
©2014 Macmillan Publishers Limited. All rights reserved 


25. Tierney, P. L, Degenetais, E., Thierry, A. M., Glowinski, J. & Gioanni, Y. Influence of 
the hippocampus on interneurons of the rat prefrontal cortex. Eur. J. Neurosci. 20, 
514-524 (2004). 

26. McCartney, H.,Johnson, A. D., Weil, Z.M. & Givens, B. Theta reset produces optimal 
conditions for long-term potentiation. Hippocampus 14, 684-687 (2004). 

27. Buzsaki, G., Grastyan, E., Tveritskaya, |. N. & Czopf, J. Hippocampal evoked 
potentials and EEG changes during classical conditioning in the rat. 
Electroencephalogr. Clin. Neurophysiol. 47, 64-74 (1979). 

28. Rizzuto, D.S. et al. Reset of human neocortical oscillations during a working 
memory task. Proc. Nat! Acad. Sci. USA 100, 7931-7936 (2003). 

29. Livneh, U. & Paz, R. Amygdala-prefrontal synchronization underlies resistance to 
extinction of aversive memories. Neuron 75, 133-142 (2012). 

30. Chang, C. H., Berke, J. D. & Maren, S. Single-unit activity in the medial prefrontal 
cortex during immediate and delayed extinction of fear in rats. PLoS ONE 5, 
e11971 (2010). 


Supplementary Information is available in the online version of the paper. 


Acknowledgements We thank members of the Herry laboratory, K. Benchenane and 
D. Dupret for comments on the manuscript, K. Deisseroth and E. Boyden for generously 
sharing material, J. Bacelo, S. Wolff and P. Tovote for technical and computational 
assistance, the Bordeaux Imaging center of the University of Bordeaux, and C. Poujol 
and S. Marais for technical assistance with microscopy. This work was supported by 
grants from the French National Research Agency (ANR-2010-BLAN-1442-01; 
ANR-10-EQPX-08 OPTOPATH), the European Research Council (ERC) under the 
European Union’s Seventh Framework Program (FP7/2007-2013)/ERC grant 
agreement no. 281168, a Fonds AXA pour la recherche doctoral fellowship (J.C.) and 
the Conseil Regional d’Aquitaine. T.C.M.B is a fellow of Ecole de I'Inserm Liliane 
Bettencourt-MD-PhD program, France. 


Author Contributions J.C., F.C., R.R.R., N.K., C.G.-C., H.W., AA., J.B. and T.C.M.B. 
performed the experiments and analysed the data. J.C. and C.H. designed the 
experiments and wrote the paper. 


Author Information Reprints and permissions information is available at 
www.nature.com/reprints. The authors declare no competing financial interests. 
Readers are welcome to comment on the online version of the paper. Correspondence 
and requests for materials should be addressed to C.H. (cyril.herry@inserm.fr). 


METHODS 

Animals. Male C57BL6/J mice (3 months old, Janvier) and PV-IRES-Cre mice 
(3 months old, Jackson Laboratory, B6;129P2-Pvalb'™! 4") were individu- 
ally housed for at least 7 days before all experiments, under a 12-h light-dark cycle, 
and provided with food and water ad libitum. All procedures were performed in 
accordance with standard ethical guidelines (European Communities Directive 
86/60-EEC) and were approved by the committee on Animal Health and Care of 
Institut National de la Santé et de la Recherche Médicale and French Ministry of 
Agriculture and Forestry (authorization A3312001). 

Behaviour. Fear conditioning and extinction took place in two different contexts 
(context A and B). The conditioning and extinction boxes were cleaned with 70% 
ethanol and 1% acetic acid before and after each session, respectively. To score 
freezing behaviour, an automated infrared beam detection system located on the 
bottom of the experimental chambers was used (Coulbourn Instruments). The 
animals were considered to be freezing if no movement was detected for 2s. On 
day 1, C57BL6/J mice were submitted to an habituation session in context A, in 
which they received four presentations of the CS* and of the CS (total CS 
duration, 30s; consisting of 50-ms pips at 0.9 Hz repeated 27 times, 2 ms rise 
and fall; pip frequency, 7.5 kHz or white-noise, 80 dB sound pressure level). 
Discriminative fear conditioning was performed on the same day by pairing the 
CS* with a US (1-s foot-shock, 0.6 mA, 5 CS*-US pairings; inter-trial intervals, 
20-180 s). The onset of the US coincided with the offset of the CS*. The CS” was 
presented after each CS*-US association but was never reinforced (five CS” 
presentations; inter-trial intervals, 20-180 s). The frequencies used for CS* and 
CS” were counterbalanced across animals. On day 2 and day 3, conditioned mice 
were submitted to extinction training (post-fear-conditioning and extinction ses- 
sions) in context B during which they received 4 and 12 presentations of the CS” 
and CS", respectively. Retrieval of fear was tested 7 days later in context B, with 4 
presentations of the CS” and the CS“. Four distinct behavioural experiments were 
performed to collect the entire data set. 

For optogenetic experiments using archaeorhodopsin, PV-IRES-Cre mice were 
submitted on day 1 to a pre-fear-conditioning session in context A during which 
they received yellow light stimulations (250-ms pulses repeated at 0.9 Hz during 2 
min). Fear conditioning was performed on day 2 in context A, by pairing the CS* 
with the US (1-s foot-shock, 0.6 mA, 5 CS*/US pairings; inter-trial interval, 20- 
180 s). On day 2 and day 3, conditioned mice were submitted to extinction training 
(post-fear-conditioning and extinction sessions) in context B during which they 
received 12 presentations of the CS*. At the end of the last extinction session they 
received an additional four presentations of the CS* coupled to yellow light 
stimulations (each CS* pip was paired with a 250-ms light pulse). For optogenetic 
experiments using archaeorhodopsin, two distinct behavioural experiments were 
performed to collect the entire data set. For optogenetic experiments using chan- 
nelrhodopsin, PV-IRES-Cre mice were submitted on day 1 to the same fear con- 
ditioning protocol as above. A post-conditioning test was performed on day 2 in 
context B and consisted of four presentations of the CS* alone followed by four 
presentations of the CS* coupled to blue light stimulations (each CS* pip was 
paired with a 250-ms light pulse). On day 3, mice were submitted to a second test in 
context B (Test) in which they received four presentations of the CS*. For opto- 
genetic experiments using channelrhodopsin, two distinct behavioural experi- 
ments were performed to collect the entire data set. 

For the place-avoidance experiment, we used an apparatus composed of two 
plexiglas compartments (20 X 10 cm each) connected by an alleyway. The two 
compartments differed tactilely (smooth plastic versus metal bars) and visually 
(grey plexiglas with red horizontal stripes or grey plexiglas). The time spent in each 
compartment was automatically recorded by an infrared beam detection system 
located on the bottom of the apparatus (Imetronic). On day 1, mice were allowed 
to explore freely the entire apparatus during a 15-min pre-exposure. Following 
pre-exposure, the compartment in which the mice spent the most time was desig- 
nated as the most visited compartment. On day 2, mice were submitted to a 15-min 
test session during which light pulses (250-ms pulse width, repeated at 0.9 Hz) 
were delivered while the animals occupied the most visited compartment, but not 
when they occupied the less-visited compartment. The automated infrared beam 
sensors detected when the animal fully entered and exited the most visited com- 
partment on day 1. The laser was automatically turned on for the period of time in 
which the animal stayed in the most visited compartment. The laser was turned on 
only when the animal fully entered the most visited compartment, not before the 
entrance. For place avoidance experiments, two distinct behavioural experiments 
were performed to collect the entire data set. 

For pharmacological experiments, C57BL6/J mice were submitted to a fear 
conditioning paradigm consisting of CS* and US pairings in context A as described 
above. On days 2, 3 and 4, conditioned mice were tested in context B during which 
they received four presentations of the CS* before muscimol injections (Day 2, 
Test pre-MUS), 5 min after muscimol injections (Day 3, Test MUS), and 24 hrs 
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following muscimol injections (Day 4, Test post-MUS) . For pharmacological 
experiments, two distinct behavioural experiments were performed to collect the 
entire data set. 

Surgery and recordings. Mice were anaesthetized with isoflurane (induction 3%, 
maintenance 1.5%) in Oz. Body temperature was maintained at 37°C with a 
temperature controller system (FHC). Mice were secured in a stereotaxic frame 
and unilaterally implanted in the left dorsomedial prefrontal cortex (dmPFC) with 
a multi-wire electrode array aimed at the following coordinates: 2 mm anterior to 
bregma; 0.3 mm lateral to the midline; and 0.8 to 1.4 mm ventral to the cortical 
surface. A subset of animals (n = 10) were also implanted in the dorsal hippocam- 
pus (dHip) at the following coordinates: 2 mm posterior to bregma; 1.2 mm lateral 
to midline; and 1.2 to 1.4 mm ventral to the cortical surface. The electrodes 
consisted of 16 individually insulated nichrome wires (13 fm inner diameter, 
impedance 30-100 KQ; Kanthal) contained in a 26-gauge stainless-steel guide 
cannula. The wires were attached to an 18-pin connector (Omnetics). For mice 
that received dmPFC and dHip multi-wire implants, two connectors were used. 
Allimplants were secured using Super-Bond cement (Sun Medical). After surgery 
mice were allowed to recover for 7 days and were habituated to handling. Analgesia 
was applied before, and 1 day after surgery (Metacam, Boehringer). Electrodes 
were connected to a headstage (Plexon) containing sixteen unity-gain operational 
amplifiers. The headstage was connected to a 16-channel preamplifier (gain 100 
bandpass filter from 150 Hz to 9 kHz for unit activity and from 0.7 Hz to 170 Hz for 
field potentials, Plexon). Spiking activity was digitized at 40 kHz and bandpass 
filtered from 250 Hz to 8 kHz, and isolated by time-amplitude window discrim- 
ination and template matching using a Multichannel Acquisition Processor system 
(Plexon). At the conclusion of the experiment, recording sites were marked with 
electrolytic lesions before perfusion, and electrode tips locations were recon- 
structed with standard histological techniques. 

Single-unit analyses. Single-unit spike sorting was performed using Off-Line 
Spike Sorter (OFSS, Plexon) for all behavioural sessions. Principal-component 
scores were calculated for unsorted waveforms and plotted in a three-dimensional 
principal-component space; clusters containing similar valid waveforms were 
manually defined. A group of waveforms were considered to be generated from 
a single neuron if the waveforms formed a discrete, isolated, cluster in the prin- 
cipal-component space and did not contain a refractory period less than 1 ms, as 
assessed using auto-correlogram analyses. To avoid analysis of the same neuron 
recorded on different channels, we computed cross-correlation histograms. If a 
target neuron presented a peak of activity at a time that the reference neuron fired, 
only one of the two neurons was considered for further analysis. After fear con- 
ditioning, if the same neuron was sequentially recorded during different beha- 
vioural sessions, we considered only the first behavioural session in which it was 
recorded. To separate putative inhibitory interneurons (INs) from putative excit- 
atory principal neurons (PNs) we used an unsupervised cluster algorithm based on 
Ward’s method. In brief, the Euclidian distance was calculated between all neuron 
pairs based on the three-dimensional space defined by each neuron’s average half- 
spike width (measured from trough to peak), the firing rate and the area under the 
hyperpolarization phase of the spike. An iterative agglomerative procedure was 
then used to combine neurons into groups based on the matrix of distances such 
that the total number of groups was reduced to give the smallest possible increase 
within-group sum of square deviation. To assess the significance of cross-correlogram 
analyses performed between pairs of recorded neurons, a mean firing rate with 
95% confidence limits of the target neuron was calculated. Significant short- 
latency inhibitory or excitatory interactions were retained if the number of action 
potentials of the target neuron was inferior or superior to the 95% confidence 
limits, respectively. Moreover, to show that cross-correlations were not simply 
occurring by chance or were due to CS presentations, we performed two controls. 
First, the spike train of the neuron was shuffled 100 times and a shuffled cross- 
correlogram was computed. Absence of short-latency interaction in the shuffled 
cross-correlogram was indicative that the cross-correlations were not due to 
chance. Second, to control that short-latency interactions were not artificially 
induced by stimulus presentations, we computed a shift predictor and subtracted it 
from the original cross-correlogram. Persistence of short-latency cross-correlations 
indicates that the neuronal interactions were not due to CS presentations. CS or 
light-induced neural activity of recorded neurons was calculated by comparing the 
firing rate after stimulus onset with the firing rate recorded during the 500 ms 
before stimulus onset (bin size of 10 ms) using a z-score transformation. z-score 
values were calculated by subtracting the average baseline firing rate established 
over the 500 ms preceding stimulus onset from individual raw values and by 
dividing the difference by the baseline standard deviation. Only CS* responsive 
neurons (at least one significant positive or negative z-score bin (z-score > + 1.67, 
P<0.05) within 100 ms following CS onset) were considered for further analysis. 
For statistical analysis, z-score comparisons were performed using the average 
z-score value calculated during the 150 ms after CS onset. 
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To identify the main firing patterns among INs, we used an unbiased principal 
component analysis (PCA) based on the neuronal activity evoked by CS* pre- 
sentations (z-score 500 ms before and after CS” presentations, CS* presentations 
1-4 in post-fear-conditioning, extinction and retrieval sessions, each CS* consist- 
ing of 27 individual sound pips; bin size of 10 ms). Only the first principal com- 
ponent was considered (PC1) because it explained most of the variance of our data 
set. Type 1 and type 2 interneurons were defined as correlated and inversely 
correlated, respectively, with PC1 at the P< 0.001 significance level. Co-firing 
between recorded PNs pairs was established by quantifying the number of non- 
overlapping 30-ms time windows following CS* presentations during which co- 
firing events occurred (each pip presentation, CS* presentations 1-4, 108 pips, 
post-fear-conditioning, extinction or retrieval sessions). We then calculated a ratio 
of coincident firing by dividing the number of co-firing occurrences during CS* 
presentations by those obtained during CS presentations. This coincident firing 
ratio was normalized to the pre-CS period (500 ms pre CS) using a z-score trans- 
formation. To control that the changes in coincident firing between CS* and CS” 
conditions were not due to an increase in PNs firing rate during CS* presenta- 
tions, the same analysis was performed but this time the number of co-firing events 
in each 30-ms time window was normalized by the total number of spikes of the 
two neurons in this particular time window. Statistical analyses were performed 
using paired Student’s t-tests post hoc comparisons at the P< 0.05 level of sig- 
nificance unless indicated otherwise. Results are presented as mean ~ s.e.m. 
Statistical analyses. For each statistical analysis provided in the manuscript, the 
Kolmogorov-Smirnov normality test was first performed on the data to determine 
whether parametric or non-parametric tests were required. Two different approaches 
were used to calculate the sample size. For studies in which we had sufficient 
information on response variables, power analyses were carried out to determine 


the number of mice needed. For studies in which the behavioural effect of the 
manipulation could not be pre-specified, such as optogenetic experiments, we used 
a sequential stopping rule (SSR). In essence this method enables null-hypothesis 
tests to be used in sequential stages, by analysing the data at several experimental 
points using t-tests. Usually the experiment started by testing only a few animals 
and if the P value was below 0.05, the investigator declared the effect significant and 
stopped testing. If the P value was greater than 0.36, the investigator stopped the 
experiment and retained the null hypothesis. For sample-size estimation using 
power analyses, we used an on-line power analysis calculator (G*power3). For 
each analysis, sample size was determined using a power >0.9 and alpha error = 0.05. 
All tests were two sided. Power analyses were computed for matched pairs (dif- 
ferential conditioning protocol in which we used an internal control (Fig. 1) and 
pharmacological experiments (Extended Data Fig. 8)). In our behavioural experi- 
ments, a critical parameter is freezing level, and the numerical endpoint typically 
ranges between 50 and 70% freezing for CS presentations immediately following 
auditory fear conditioning and between 10 and 30% freezing for CS presenta- 
tions. A minimum biologically significant difference in the mean values between 
CS” and CS™ conditions (Fig. 1), or between cs* presentations before and after 
pharmacological treatment (Extended Data Fig. 8) is 1.5-fold. If we assume a 
standard deviation of 1.5 for a mean value of 60% freezing for CS* and 20% 
freezing for CS” or CS* after pharmacological treatment (which are realistic 
numbers), then a minimal n = 6 (paired t-test) or n = 8 (Wilcoxon signed-rank 
test) is needed to reject the null hypothesis with 90% probability. Sample size 
determination using SSRs analyses were used for optogenetic experiments in 
which it was not possible to determine a priori the effect of the optical manipula- 
tion. We used P values of 0.05 and 0.36 for lower and upper criterion. Using this 
strategy we ended up with an n comprising between 6 and 13 animals per group. 
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Extended Data Figure 1 | Separation of putative principal neurons 

and putative interneurons. a, Left panel, superimposed waveforms recorded 
from two different units. Right panel, spikes originating from individual units 
were sorted using three-dimensional principal-component analysis. 

b, Corresponding auto-correlograms, colour-coded as in a, displaying clear 
refractory periods. c, Among the population of dmPFC neurons displaying 
significant excitatory or inhibitory CS*-evoked responses (n = 493), 71.2% 
were classified as putative principal neurons (PNs, blue circles, n = 351) and 
28.8% as putative interneurons (INs, red circles, n = 142) using an unbiased 
unsupervised cluster-separation algorithm based on three electrophysiological 
properties: firing frequency, spike half-width and spike area under waveform 
(AUP) peak. Inset, average waveform ofa representative PN and IN illustrating 
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the methodology used to quantify spike width (SW) and the spike segment used 
to calculate the AUP. d, Top panel, representative cross-correlogram 
performed between a putative inhibitory IN and a non-identified neuron 
showing a short-latency, presumably monosynaptic, inhibitory interaction (7 
pairs identified among putative INs, no inhibitory interaction among putative 
PNs). Bottom panel, representative cross-correlogram between a putative PN 
and a non-identified neuron showing a short-latency, possibly monosynaptic, 
excitatory interaction (20 pairs identified among PNs, no excitatory interaction 
from putative INs). Reference events correspond to the spikes of the pre- 
synaptic neuron (dashed line at time 0, bins of 0.5 ms). Grey circles represent 
neurons that were not tone-responsive. 
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Extended Data Figure 2 | CS*-evoked firing patterns and inhibitory 
interactions of putative INs. a, Left panel, distribution of the fraction of 
variance for the 20 first principal components (PCs) obtained with principal 
component analysis (PCA). PC1, which accounted for more than 20% of 
variance of the data set, was used for the analysis. Middle panel, first principal- 
component coefficients representing the main firing pattern evoked by CS* 
(CS* onset at time 0) of the IN data set. Right panel, distribution of dmPFC IN 
correlation coefficients with PC1. The dashed lines indicate the levels of 
significance (P < 0.001). Among the 142 INs, 83 (58.5%) displayed a significant 
positive (n = 68, 48%, dark red bars) or negative (n = 15, 10.6%, light red bars) 
correlation with PC1, whereas 41.5% INs (n = 59, grey bars) did not. b, Raster 
plots and PSTH of individual INs negatively correlated (left part, type 2 IN), not 
correlated (middle part) or positively correlated (right part, type 1 IN) with 
PC1. Type 1 INs were excited, whereas type 2 INs were inhibited by CS*. Bins 
of 10 ms. c, PSTH ofall type 1 (n = 68) and type 2 (n = 15) INs illustrating the 
CS* -evoked responses (Post-FC, Ext. or Ret. Sessions, CS* 1-4). Bins of 10 ms. 
d, Individual (type 1 INs, dark red dots; type 2 INs, light red dots) and averaged 
(red dots) latencies of the first significant time bin (z score < —1.65 or 


> +1.65) following CS* for type 1 and type 2 INs recorded simultaneously 
(n = 7 pairs recorded in 5 mice). CS* -evoked excitation in type 1 INs preceded 
CS*-evoked inhibition in type 2 INs (mean latency: type 1, 24.3 + 2 ms; type 2, 
38.6 + 4.6 ms; paired t-test, *P < 0.05). Error bars, mean + s.e.m. e, Cross- 
correlation analysis performed between a type 1 and a type 2 IN recorded 
simultaneously outside CS. The cross-correlogram shows a short latency, 
potentially monosynaptic, inhibitory interaction. Reference event, spikes of the 
type 1 IN (dashed line at time 0). Bins of 5 ms. f, Locations of recording sites and 
mean firing frequencies of type 1 (T1, n = 68) and type 2 (T2, n = 15) INs 
(Mann-Whitney test, **P < 0.01; Cgl, anterior cingulate cortex; PL, prelimbic 
area; IL, infralimbic area). g, Firing modulation of representative type 1 and 
type 2 INs with dmPFC theta oscillations filtered in the 8-12-Hz range (12-min 
recordings). Bins of 10°. h, Mean strength of firing synchronization to local 
theta oscillations as measured with the mean resultant length (MRL) vector (left 
panel, Mann-Whitney test, type 1 versus type 2, ***P < 0.001) and 
distribution of the preferred phases (right panels) for type 1 and type 2 INs 
significantly phase-locked to theta oscillations (typel, n = 29/68; type 2, 

n= 15/15). 
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Extended Data Figure 3 | Anatomical characterization of AAV-mediated 
ArchT-GFP expression in PV-IRES-Cre transgenic mice and 
electrophysiological characteristics of ArchT , ChR2 and GFP PV-IRES- 
Cre-infected PV neurons. a, Representative confocal micrographs used for PV 
and GFP co-localization assessment. Left panel, ArchT-GFP labelled with anti- 
GFP Alexa 488; middle panel, PV immunofluorescence; right panel, merge. 
Single optical slices, in the same focal plane. b, Quantitative analysis of viral 
infection specificity and efficacy. Pie charts show the numbers of neurons 
positive for GFP and/or PV in two mice (left and middle charts) and averaged 
proportions (right chart). c, Representative ChR2- (left) and ArchT-evoked 
(right) currents recorded from PVINs with somata located in layer 2/3 of the 
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dmPFC (1 s light simulation). d, Representative optically evoked action 
potential firing and inhibition of PVINs expressing ChR2 (left, 500-ms blue 
light pulses) or ArchT, respectively (right, 250-ms yellow light pulse during a 
250-pA current pulse injection). e, Left panel, changes in firing frequency of 
PVINs expressing GFP (white dots, n = 7), ChR2 (blue dots, n = 5) or ArchT 
(yellow dots, n = 8) upon injection of increasing current pulses (current pulses 
range, 0-400 pA). No significant differences were observed between groups. 
Right panel, resting membrane potentials of INs expressing GFP (white bar, 
n= 7), ChR2 (blue bar, n = 5) or ArchT (yellow bar, n = 8). No significant 
differences were observed between groups (unpaired t-tests). 
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Extended Data Figure 4 | Type 2 PVINs mediate conditioned fear 
responses. a, z-score transformation of CS‘ -evoked firing of a non-type-2 IN 
for sound pips outside (No freez.) or inside (Freez.) freezing periods during 
the extinction session (CS*1-12; No freez., 141 pips; Freez., 156 pips). This 
neuron was not classified as a type 1 or type 2 IN. b, Left panel, raster plot 
illustrating optogenetic identification of the same non-type-2 IN as 
ArchT-expressing (that is, PV-expressing). Right panel, mean z-score 
transformation of all non-type-2 INs identified as PV-expressing INs (n = 4; 
light-pulse duration, 250 ms; 108 stimulation trials). c, z-score transformation 
of CS*-evoked firing of a type 2 IN for No freez. and Freez. periods during 
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the extinction session (CS* 1-12; No freez., 141 pips; Freez., 156 pips). 

d, Raster plot illustrating optogenetic identification of the same type 2 IN as 
ArchT-expressing (that is, PV-expressing) (light-pulse duration, 250 ms; 108 
stimulation trials). e, CS*-evoked changes in firing rate in two type 2 PVINs 
identified with optogenetic, and corresponding freezing scores of the two mice 
in which they were recorded (dots, mean z-score 150 ms post CS; bars, blocks 
of 4CS* presentation each, both during the second extinction session; 
light-pulse duration, 250 ms; 108 stimulation trials). Light-induced inhibition 
of PV, including type 2 INs, reinstated freezing behaviour. Error bars indicate 
mean + s.e.m. 
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Extended Data Figure 5 | Optogenetic inhibition of prefrontal PVINs 2 and 3 for GFP- and ArchT-infected mice. A one-way analysis of variance 
induces place aversion. a, On day 1, GFP- and ArchT-infected mice (1 =11 (ANOVA) repeated measures performed on values from the GFP or the ArchT 
and 13, respectively) were exposed to a two-compartment place aversion group revealed a significant effect only for the ArchT group (ArchT, 
apparatus during 15 min. Following pre-exposure, the most visited F149 = 4.234, P< 0.05; GFP, F219 = 0.950, P = 0.4191). Post-hoc analysis 


compartment was selected for each animal. On day 2, systematic yellow-light- _ revealed that on day 2, light inhibition of PVINs induced an aversion of the 
induced inhibition of PVINs was triggered only in the most visited most visited compartment for ArchT infected animals in comparison to day 1 
compartment during a 15-min exposure session. On day 3, GFP and ArchT (ArchT mice, day 1 versus day 2, paired t-test, **P < 0.01) and to GFP controls 


infected mice (n = 6 in both cases) were re-exposed to the place aversion on day 2 (day 2, ArchT versus GFP, unpaired t-test, *P < 0.05; 250-ms pulses 
apparatus during 15 min to evaluate the long-term effect of yellow-light delivered at 0.9 Hz). On day 3, ArchT mice did not avoid the most visited 
stimulation during day 2. b, Time spent in the most and less visited compartment any more (ArchT mice, day 2 versus day 3, unpaired t-test, 
compartments on day 1 for individual infected mice (GFP and ArchT ). * P<0.05). Error bars indicate mean + s.e.m. 


c, Average percentage of time spent in the most visited compartment on days 1, 
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Extended Data Figure 6 | Optogenetic activation of PVINs inhibits 
principal neurons and reduces freezing behaviour. a, Raster plots and 
peristimulus time histograms illustrating the CS* -evoked excitation of a 
representative PN (left panel, Post FC, CS* presentations 1-4, 108 pips) and its 
blockade upon optogenetic-induced activation of PVINs (right panel, CS* 
presentations 5-8; light-pulse duration, 250 ms; 108 pips + stimulation trials) 
during the Post-FC session. b, z-score-transformed peristimulus time 
histogram showing PNs inhibition (n = 7) following optogenetic-evoked 
activation of PVINs during cst presentations (Post-FC session, cst 
presentations 5-8; light-pulse duration, 250 ms; 108 stimulation trials). 

c, Freezing behaviour (bars, n = 3 mice, block of 4 CS*) and CS*-evoked firing 
changes of PNs (red dots, n = 7 neurons, mean z-score 100 ms post CS) before 
and in response to light-induced activation of PVINs during Post-FC sessions 
(light pulse duration, 250 ms; 108 stimulation trials; CS* 1-4 and 5-8, 
respectively). Optogenetic activation of PVINs inhibited PNs and reduced 
conditioned freezing behaviour (Wilcoxon signed-rank test, *P < 0.05). 

d, z-score transformed peristimulus time histogram showing CS*-evoked 
excitation of PNs ( = 3) exhibiting antidromic responses to BLA stimulations 
(Post-FC, CS” and CS* presentations 1-4, 108 pips each). These three neurons 
were included in the seven neurons for which CS* -evoked excitation was 
blocked by light excitation of PVINs (a and b). Error bars indicate 

mean + s.e.m. 
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Extended Data Figure 7 | Transient amplitude increase and phase reset of 
local theta oscillations during fear expression. a, Left panel, power spectrum 
of the non-filtered dmPFC LFPs recorded during Post-FC sessions (n = 28 
mice) for non-freezing (No freez.) and freezing (Freez.) periods showing a 
prominent 8-12-Hz component (that is, theta) only during non-freezing 
periods. Right panel, normalized theta power (8-12 Hz) for freezing and non- 
freezing periods during Post-FC sessions (n = 28 mice, Wilcoxon signed-rank 
test ***P < 0.001). b, Top panels, non-filtered dmPFC LFP traces selected on 
the basis of prominent theta oscillations illustrating the transient increase in 
amplitude and phase reset of theta oscillations in response to CS* (Post-FC, 
1 trial). Bottom-left panel, representative dmPFC 8-12-Hz LFP traces 
illustrating the phase reset and transient amplitude increase of theta oscillations 
in response to CS* or CS~ presentations (Post-FC, 27 pips each). Bottom-right 
panel, average ratio of LFP theta power (500 ms post CS or 500 ms pre CS) 
in response to CS” and CS" pips. This analysis revealed a larger transient 
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increase in LFP upon cst presentations (Post-FC, n = 28 mice, CS” versus 
CS", paired t-test, ***P < 0.001). c, Left panel, representative dmPFC LFP 
traces filtered in the 8-12-Hz range, illustrating the phase resetting of theta 
oscillations during presentations of CS pips associated with no freezing or 
freezing behaviour (Post-FC, 27 pips). Right panel, quantification of the 
variance of the first theta peak occurrence following pip presentations in 
freezing and non-freezing periods (Post-FC, n = 28 mice, No freez. versus 
Freez., paired t-test, ***P < 0.001). A small variance corresponds to a strong 
theta phase resetting. d, Quantification of the time variance of the first theta 
peak following CS” and CS™ presentations or No freez. and Freez. periods for 
extinction and retrieval sessions (extinction, CS” presentations and cst 
presentations 1-4, n = 28 mice; retrieval, CS” and CS", n = 21 mice; CS~ 
versus CS*, paired t-test, ***P < 0.001; No freez. versus Freez., paired t-test, 
***P < 0,001). Error bars indicate mean + s.e.m. 
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dipyrromethene boron difluoride (BODIPY), red). b, Experimental design and _ of prefrontal theta oscillations. Left panel, representative dmPFC LFP 

mean freezing values of fear conditioned mice (n = 6) before (Test Pre-MUS.), __ traces filtered in the 8-12-Hz range (Test-MUS., first CS*). Right panel, 
following (Test MUS.), and one day after (Post-MUS.) injections of MUS inthe quantification of the time variance of the first theta peak following CS* 

MS. Following fear conditioning, targeted inactivation of the MS had no effect _ presentations before, following and 1 day after MS inactivation (Pre-MUS., 
on basal locomotor activity or CS*-evoked freezing responses (paired t-tests). | MUS., Post-MUS.,CS* presentations 1-4, paired t-tests). MS inactivation had 
c, Illustrative raw and filtered (8-120-Hz) LFP traces recorded in the dorsal no effect on dmPFC theta phase resetting upon CS™ presentations. Error bars 
CA1 (dCA1) before and following MUS injections in the MS. d, Left panel, indicate mean + s.e.m. 
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Extended Data Figure 10 | PVINs control principal-neuron theta phase 
locking and spiking synchronization. a, Normalized averaged ratio of 
changes in coincident activity between pairs of PNs induced by CS* and CS” 
and corrected for changes in firing rate (Post-FC, Ext. or Ret. sessions; n = 975 
pairs from 308 PNs). Dashed line indicates significant z score (P < 0.05). Bins 
of 30 ms. b, Mean vector length (MRL) and concentration of Von Mises fit (1) 
upon CS” or CS*, two measures of modulation strength in phase with theta 
oscillations (Post-FC, Ext. or Ret. Sessions). Only neurons significantly phase 
locked to theta and for which at least 50 spikes were recorded during CS* were 
included (n = 45) (CS~ versus CS*, Wilcoxon tests, *** P< 0.001). Error bars 
indicate mean + s.e.m.CS™ entrains a stronger locking of PN spikes to ongoing 
theta oscillations. Together with the precise timing between CS" onset 
(resetting) and subsequent theta cycles, this ensures robust, coincident and 
timed spiking of PNs. c, Distribution of log-transformed Rayleigh’s test Z values 
of PN theta modulation before and upon light-induced inhibition (top, n = 41 
neurons) and light-induced activation (bottom, n = 18 neurons) of PVINs 
(light-pulse duration, 250ms; 108 trials for each; yellow light, stimulation at the 
end of the behavioural session; blue light, stimulation during Post-FC session, 
CS* presentations 5-8). Dashed line indicates significant theta phase locking 
threshold (In (Z) = 1.1, P = 0.05). d, Theta modulation of PNs significantly 
phase locked to theta and displaying at least 15 spikes during No light and Light 
conditions. Modulation with local theta was measured with the MRL (top-left 
panel, n = 8 neurons, yellow light stimulation, paired t-tests, No light versus 
Light, *P < 0.05; bottom-left panel, n = 8 neurons, blue light stimulation, No 
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light versus Light, * P< 0.05) and x (top-right panel, n = 8 neurons, yellow 
light stimulation, paired t-tests, No light versus Light, ***P < 0.001; bottom- 
left panel, n = 8 neurons, blue light stimulation, No light versus Light, 

***D < 0,001). Error bars indicate mean + s.e.m. These results show that 
inhibiting PVINs is both sufficient to increase PNs’ modulation with local theta, 
and necessary for theta entrainment of PNs evoked by CS*.e, Top panel, 
distribution of PNs’ preferred theta phase (n = 308) during cycles around CS. 
The phases of LFPs were aligned to the first theta peak following CS onset to 
mimic phase resetting of local theta (one theta cycle before, and three theta 
cycles following CS were included, bins of 45°). Bottom panel, distribution of 
individual PNs’ preferred theta phases during theta cycles around CS showing 
a synchronization of PNs around the peak of the LFP (Rayleigh’s test for 
circular uniformity, first theta cycle post CS, P < 0.001, indicating that the 
circular distribution is not uniform). f, Top panel, distribution of PNs’ preferred 
theta phase (n = 41) during theta cycles outside light stimulation (left part, 
15.8% freezing) and upon light-induced resetting of theta oscillations (right 
part, 36.8% freezing; one theta cycle before, and 3 theta cycles following CS were 
included, bins of 45°). Bottom panel, distributions of individual PNs’ preferred 
theta phase outside and upon light stimulation. Despite a low number of 
neurons and a moderate freezing induced by light inhibition of dmPFC PVINs 
(36.8% freezing), this analysis revealed that light-induced reset of local theta 
oscillations promotes neuronal synchronization of PNs (Rayleigh’s test for 
circular uniformity, first theta cycle post CS; Light, P< 0.00; No light, P = NS). 
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Divergent angiocrine signals from vascular niche 
balance liver regeneration and fibrosis 


Bi-Sen Ding'*, Zhongwei Cao, Raphael Lis!, Daniel J. Nolan’, Peipei Guo!, Michael Simons’, Mark E. Penfold’, Koji Shidol, 


Sina Y. Rabbany’” & Shahin Rafii' 


Chemical or traumatic damage to the liver is frequently associated 
with aberrant healing (fibrosis) that overrides liver regeneration’. 
The mechanism by which hepatic niche cells differentially modu- 
late regeneration and fibrosis during liver repair remains to be 
defined® *. Hepatic vascular niche predominantly represented by 
liver sinusoidal endothelial cells deploys paracrine trophogens, 
known as angiocrine factors, to stimulate regeneration”*. Nevertheless, 
it is not known how pro-regenerative angiocrine signals from liver 
sinusoidal endothelial cells is subverted to promote fibrosis'®”’. 
Here, by combining an inducible endothelial-cell-specific mouse gene 
deletion strategy and complementary models of acute and chronic 
liver injury, we show that divergent angiocrine signals from liver 
sinusoidal endothelial cells stimulate regeneration after immediate 
injury and provoke fibrosis after chronic insult. The pro-fibrotic tran- 
sition of vascular niche results from differential expression of stromal- 
derived factor-1 receptors, CKCR7 and CXCR4 (refs 18-21), in liver 
sinusoidal endothelial cells. After acute injury, CKCR7 upregula- 
tion in liver sinusoidal endothelial cells acts with CXCR4 to induce 
transcription factor Id1, deploying pro-regenerative angiocrine fac- 
tors and triggering regeneration. Inducible deletion of Cxcr7 in 
sinusoidal endothelial cells (Cxcr7“®“£°) from the adult mouse 
liver impaired liver regeneration by diminishing Id1-mediated pro- 
duction of angiocrine factors’”’. By contrast, after chronic injury 
inflicted by iterative hepatotoxin (carbon tetrachloride) injection 
and bile duct ligation, constitutive FGFR1 signalling in liver sinusoidal 
endothelial cells counterbalanced CXCR7-dependent pro-regenerative 
response and augmented CXCR¢4 expression. This predominance of 
CXCR4 over CXCR7 expression shifted angiocrine response of liver 
sinusoidal endothelial cells, stimulating proliferation of desmin* 
hepatic stellate-like cells’ and enforcing a pro-fibrotic vascular niche. 
Endothelial-cell-specific ablation of either Fgfrl (Fgfr1'“*®“F) or 
Cxcr4 (Cxcr4“P@"*F°) in mice restored the pro-regenerative pathway 
and prevented FGFR1-mediated maladaptive subversion of angio- 
crine factors. Similarly, selective CXCR7 activation in liver sinus- 
oidal endothelial cells abrogated fibrogenesis. Thus, we demonstrate 
that in response to liver injury, differential recruitment of pro- 
regenerative CXCR7-Id1 versus pro-fibrotic FGFR1-CXCR4 angio- 
crine pathways in vascular niche balances regeneration and fibrosis. 
These results provide a therapeutic roadmap to achieve hepatic 
regeneration without provoking fibrosis’”*. 

Despite the liver’s capacity to undergo regeneration, chronic or over- 
whelming injury often causes liver fibrosis that culminates in cirrhosis 
and hepatic failure’’. The integrated process of liver repair includes 
regeneration and wound healing characterized by synthesis of extra- 
cellular matrix proteins. Both processes are modulated by dynamic 
interplay between parenchymal hepatocytes and non-parenchymal 
cells”**?4*°, including hepatic stellate cells'*’, inflammatory cells®*, 
biliary epithelial cells and liver sinusoidal endothelial cells (LSECs)?"*"*"°. 


As such, defining the multicellular crosstalk that balances regeneration 
and dysfunctional (maladaptive) healing’ holds promise for designing 
treatment for liver diseases. 

LSECs that line liver sinusoidal vasculature induce hepatic organo- 
genesis in manners that extend beyond their passive role for metabolite 
delivery”'*'*"*. By deploying paracrine growth regulators, which we have 
defined as angiocrine factors, LSECs trigger regeneration of hepato- 
cytes”"'!?57°, However, aberrant activation of LSECs in the context of 
chronic injury provokes fibrosis'*’’. This dichotomy of LSEC niche 
function in mediating liver repair implies divergent angiocrine signals 
balance regeneration and fibrosis’’. Therefore, we sought to decipher 
the mechanisms that subvert pro-regenerative capacity of LSECs to a 
pro-fibrotic state. 

In response to tissue injury, cytokines and chemokines, such as stromal- 
derived factor (SDF)-1 (Cxcl12), are upregulated to initiate regenera- 
tion by switching on its receptors CKCR4 and CXCR7 (refs 18-21). 
Although CXCR4 activation both in haematopoietic and in vascular 
cells modulates angiogenesis and haematopoiesis, expression of another 
SDF-1 receptor, CXCR7, is mainly restricted to endothelial cells, with 
its function primarily believed to be pivotal in vascular patterning and 
tumour neo-angiogenesis. However, elucidating the mechanism by 
which the SDF-1 pathway orchestrates liver repair is hindered by the 
lack of cell-type-specific genetic models in defined settings of liver 
injuries. 

To unravel the divergent role of LSECs in modulating liver repair, 
we used a single injection of carbon tetrachloride (CCl,) and acetami- 
nophen, which cause acute liver injury, as well as chronic injury models 
of repeated CCl, injection and bile duct ligation (BDL) (Fig. 1a). At 
day 2 after single CCl, injury, CKCR7 was upregulated specifically 
in vascular endothelial (VE)-cadherin* LSECs (Fig. 1b-e and Sup- 
plementary Fig. 1). By contrast, CXCR4 is broadly expressed by other 
cell types, and its expression remains relatively stable on LSECs after 
CCl, injury. Therefore, we have identified CXCR7 as an inducible 
LSEC-specific SDF-1 receptor in response to liver injury. 

Our group””” and others”* have shown that after partial hepatectomy 
LSECs, by producing angiocrine factors such as Wnt2 and hepatocyte 
growth factor (HGF), elicit liver regeneration. Activation of transcrip- 
tion factor Id1 in LSECs was essential for this process’. SDF-1 induced 
Id1 upregulation in cultured human LSECs, which was abrogated by 
genetic silencing of either Cxcr7 or Cxcr4 (Fig. 1f and Supplementary 
Figs 2 and 3). CXKCR7-selective agonist TC14012 similarly induced Id1 
upregulation. Immunoprecipitation—-western blot demonstrated that 
after SDF-1 stimulation, CXCR7 was associated with CXCR4 and 
B-arrestin in LSECs (Supplementary Fig. 4). Therefore, SDF-1 stimu- 
lates Id1 induction through enabling cooperation between CXCR7 and 
CXCR4 (refs 27 and 28). 

To determine the contribution of CXCR7 in LSEC-mediated liver 
repair, we used a tamoxifen-inducible endothelial-cell-specific Cre** 
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Figure 1 | After acute liver injury, upregulation of SDF-1 receptor CXCR7 
in LSECs stimulates angiocrine-mediated regeneration. a, Liver injury 
models for studying the maladaptive transition of pro-regenerative LSEC 
function to a pro-fibrotic vascular niche. b-e, CXCR7 is specifically upregulated 
on VE-cadherin *CD31* LSECs after acute chemical injury. After injection 
of CCl4, CKCR7 and CXCR4 were determined in isolated LSECs (b), liver 
sections (¢, d) and non-parenchymal cells (NPCs) (e). CXCR7 was expressed on 
LSECs but not large vessels; n = 5. Scale bar, 50 jum in Fig. 1; all data hereafter 
are presented as mean + s.e.m. f, SDF-1 stimulation of human LSECs 
upregulates inhibitor of DNA binding 1 (Id1) protein, a transcription factor 
inducing production of pro-regenerative angiocrine factors’. Id1 stimulation by 
SDF-1 in primary human Factor VIII" LSECs was abrogated by silencing of 
Cxcr4 and Cxcr7 in LSECs; n = 5. g, h, Endothelial cell (EC)-specific inducible 
deletion of Cxer7(Cxer7“C“F9) in mice. Mice harbouring /oxP sites flanking 
Cxcr7 were crossed with a mouse line with endothelial-cell-specific 
VE-cadherin promoter-driven Cre"*"? (VE-Cad-Cre"*™/Cdh5-PAC-Cre"*””). 


system to knockdown Cxcr7 in the endothelial cells of adult mice (Fig. 1g). 
Mice harbouring loxP site-flanking Cxcr7 were crossed with VE-Cad- 
Cre*®"/Cdh5-PAC-Cre**"” mice whereby endothelial-cell-specific 
VE-cadherin promoter drives Cre”""”. Mice carrying tdTomato fluor- 
escent protein following the floxed stop codon were used to exclude 
off-target effects of VE-Cad-Cre**"” on other liver cell types. Tamoxifen 
injection specifically activated Cre"*”” activity in endothelial cells but 
not desmin-expressing stellate-like cells (Fig. 1h and Supplementary 
Fig. 5), demonstrating induced endothelial-cell-specific deletion of 
Cxer7 (Cxcr7ECH45°) in VE-cad-Cre®®™Cxcr70r'? mice. Cxcr7- 
haplodeficient adult mice (Cxcr7“®C'*) were used as control for Cre 
toxicity. 

Compared with control mice, hepatocyte proliferation in Cxer74#74PC 
mice was significantly decreased after CCl, injury (Fig. li, j). Idl- 
dependent deployment of angiocrine factors HGF and Wnt2 from 
LSECs of Cxcr7*S2C4£° mice was reduced both after CCly- and after 


acetaminophen-induced liver injuries (Fig. 1k). The extent of liver 
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Specificity of VE-Cad-Cre®®'/Cdh5-PAC-Cre*®” was validated in reporter 
mice carrying tdTomato protein following floxed stop codon. Cxcr7 deletion or 
tdTomato expression in endothelial cells was induced by tamoxifen injection’. 
Cxcr7“F"'* mice served as control. Note the specific expression of tdTomato 
in endothelial cells but not desmin™ stellate-like cells (h, white arrows). 

i-l, Impaired liver regeneration and enhanced liver injury in Cxcr74®C/42C 
mice after acute liver injury. Cell proliferation was determined by staining for 
bromodeoxyuridine(BrdU) incorporation (i, j). Expression of Id1 and 
pro-regenerative angiocrine factors, HGF and Wnt2, in LSECs was measured 
after CCl, and acetaminophen administration (k) and serum alanine 
aminotransferase (ALT) concentration was assessed to determine the degree of 
liver injury (1); n = 4. m, CXCR7 activation in LSECs triggers Id1-mediated 
production of pro-regenerative angiocrine factors. After acute liver injury, 
CXCR7 cooperates with CXCR4, inducing pro-regenerative Id] pathway in 
LSECs and triggering angiocrine-mediated liver regeneration. Perivascular 
green cells denote stellate-like cells (refs 1-3, 22, 23). 


injury, as determined by the concentration of serum alanine amino- 
transferase, was exacerbated (Fig. 11 and Supplementary Fig. 6). Thus, 
after liver injury, SDF-1 through activation of CKCR7* LSECs triggers 
an angiocrine response to initiate liver regeneration (Fig. 1m). 
Although hepatocytes regenerate after acute liver injury, chronic 
liver damage more frequently leads to activation of myofibroblasts 
and causes fibrosis**°. To address how the pro-regenerative angiocrine 
signals of LSECs are diverted to provoke this maladaptive healing, we 
used a mouse model of chronic liver injury by repeated CCL, injection” 
(Fig. 2a, b). The CXCR7-Id1 pathway in LSECs was counterbalanced 
by CXCR4 upregulation after chronic injury (Fig. 2c-e and Supplemen- 
tary Figs 7 and 8). After repeated CCl, injection, protein concentra- 
tions of «-smooth muscle actin (SMA) and extracellular matrix protein 
collagen were augmented in Cxcr7 ARORA nice. compared with con- 
trol mice (Fig. 2f-h and Supplementary Figs 9 and 10). Injection of 
CXCR7-specific agonist TC14012 reduced the upregulation of SMA 
and collagen I in control but not Cxcr74®°"4"° mice (Fig. 2g, h and 


imited. All rights reserved 


Supplementary Fig. 11). Therefore, chronic liver injury interferes with 
pro-regenerative CXCR7-Id1 angiocrine pathway in LSECs and pro- 
motes fibrosis. 

The requirement of CXCR7 in LSECs in resolving liver fibrosis was 
tested®”*. After three CCl, injections, SMA and collagen protein con- 
centrations were enhanced in control mice; they peaked at day 8 after 
last injection and approached basal (vehicle-injected group) amount at 
day 20 (Fig. 2i-j and Supplementary Figs 12 and 13). By contrast, time- 
dependent resolution of liver injury was impaired in Cxer7'4PC"4FC 
mice. Id] pathway inhibition by repeated injection of CCl, was pre- 
vented by CXCR7 agonist TC14012 (Fig. 2k). Therefore, in response to 
injury, the CXKCR7 pathway in LSECs plays an indispensible role in 
stimulating regeneration and resolving fibrosis. After iterative stimuli, 
predominance of the CXCR4 pathway over the CXCR7-Id1 pathway 
in LSECs leads to fibrosis (Fig. 21). 

We then used BDL, a clinically relevant liver cholestasis model, to 
define how CXCR4 and CXCR7 modulate pro-fibrotic transition of 
LSECs (Fig. 3a). BDL induces biliary epithelial injury and causes con- 
tinuous cholestasis and cirrhosis. After BDL, LSECs were invested by 
perisinusoidal desmin”™ stellate-like cells (Fig. 3b). Similar to repea- 
ted CCl, injuries, there was temporal upregulation of CXCR4 and 
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suppression of CXKCR7-Id1 pathways in LSECs (Fig. 3c). SMA deposi- 
tion in the liver of Cxcr74®“"F mice was higher than that of control 
Cxcr7“®"'* mice (Fig. 3d-f). As such, loss of CXCR7 signalling in 
LSECs leads to fibrosis during BDL-induced cholestatic injury. 
Human LSECs were then stimulated with angiogenic factors VEGF- 
A and FGF-2 to investigate the mechanism whereby CXCR4 express- 
ion is enhanced to dominate over the CXKCR7 pathway. FGF-2, but not 
VEGF-A, increased CKCR4 messenger RNA (mRNA) and protein con- 
centrations, and attenuated CXCR7 expression (Fig. 3g and Sup- 
plementary Fig. 14). Specific inhibition of mitogen-activated protein 
kinase (MAPK) blocked FGF-2-driven CXCR4 induction and CKCR7 
inhibition in LSECs" (Fig. 3h). Asa result, treatment of human LSECs 
by FGF-2 suppressed Id1 induction by SDF-1 (Fig. 3i, j and Supplemen- 
tary Fig. 15), suggesting that FGF-2-induced CXCR4 upregulation and 
CXCR7 suppression in LSECs negate the Id1 pro-regenerative pathway. 
To test how FGF-2 signalling modulates angiocrine response of 
LSECs, we examined the activation of the FGF-2 receptor FGFR1 
on LSECs after BDL. There was a time-dependent upregulation and 
activation of FGFR1 (refs 16, 29, 30) concomitant with phosphoryla- 
tion of MAPK (Erk1/2) in the injured VE-cadherin* LSECs (Fig. 3k, | 
and Supplementary Figs 16 and 17). Hence, cholestatic injury causes 


Figure 2 | Iterative hepatotoxic injury perturbs 
CXCR7 pro-regenerative pathway in LSECs and 
forces the generation of a pro-fibrotic vascular 
niche. a, b, Mouse liver fibrosis is induced by 
repeated injection of CCl”. Sirius red staining was 
used to detect collagen in the injured liver. Scale 
bar, 50 jum in Fig. 2. c-e, Chronic liver injury 
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FGFR1-mediated MAPK activation in LSECs, resulting in CXCR4- 
dominated pro-fibrotic transition of angiocrine response during liver 
repair (Fig. 3m). 

To elucidate the mechanism underlying the pro-fibrotic drift of 
LSEC niche, we conditionally ablated Fefrl and Cxcr4 in endothelial 
cells of adult mice (Fo Posen’ and Cxcr4iAEChAECy (Fig. 4a). After 
BDL, perisinusoidal expansion of desmin™ cells, deposition of collagen 
and SMA, MAPK activation and CXCR7 suppression in LSECs of 
iy leans mice were all prevented compared with control mice 
(Fig. 4b-g and Supplementary Fig. 18). CKCR7-dependent angiocrine 
expression of Wnt2 and HGF was restored in Fefrl IAECHAEC mice (Fig. 4e). 
Therefore, endothelial-cell-specific deletion of Fgfr1 in adult mice pre- 
vented the aberrant transition of LSECs into a pro-fibrotic state by BDL. 

To unravel the altered angiocrine response in chronically injured 
LSECs, we isolated and analysed LSECs from BDL and sham-operated 
mice (Supplementary Fig. 19). In injured LSECs, there was significant 
upregulation of pro-fibrotic factors, including transforming growth 
factor (TGF)-B, bone morphogenetic protein (BMP)2 and platelet- 
derived growth factor (PDGF)-C, concomitant with suppression of 
anti-fibrotic genes, such as follistatin and apelin. This divergent drift 
of angiocrine factor production in LSECs after BDL was diminished in 
Fefr18PC"AEC mice, as evidenced by restoration of anti-fibrotic genes 
and reduced expression of fibrotic factors (Fig. 4h). 


The extent of fibrosis after BDL was further tested in Cxcr4'A4®C/4®C 


mice. Compared with control Cxcr4AFC'* mice, hepatic deposition 
of SMA and collagen, and perisinusoidal accumulation of desmin* 
stellate-like cells, were reduced in Cxcr4“P@"4®° mice after BDL 
(Fig. 4i-m and Supplementary Fig. 20). The reduction of liver fibrosis 
in Cxcr4'APC"AEC mice implies that constitutive CXCR4 activation in 
LSECs by chronic injury establishes a pro-fibrotic vascular niche, 
activating adjacent myofibroblast cells and provoking fibrogenesis 
(Fig. 4n). 

Although liver regeneration after partial hepatectomy proceeds 
impeccably without fibrosis, liver repair after chronic injury is associated 
with fibrosis that compromises restoration of liver function. Therefore, 
identifying the molecular pathways modulating liver regeneration and 
aberrant healing will open up therapeutic avenues for treatment of liver 
cirrhosis and failure. We have shown that after 70% partial hepatect- 
omy, activation of the VEGFR2-Id1 pathway in LSECs leads to liver 
regeneration’. Here we demonstrate that FGFR1-mediated CXCR4 
upregulation and CXCR7 suppression in LSECs counterbalance the 
pro-regenerative function of LSECs and lead to fibrosis. 

We used complementary acute and chronic injury models to decipher 
the contribution of LSECs in liver repair (Fig. 1a). In the mouse liver, we 
have identified a preferential induction of pro-regenerative CXCR7- 
dependent signalling in LSECs that responds to acute injury (Fig. 1). 
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Figure 4 | FGFR1 activation of CXCR4 in LSECs provokes pro-fibrotic 
angiocrine signals in liver repair. a, endothelial-cell-specific inducible 
deletion of Fgfrl and Cxcr4 (Fefr iAP "“"C and Cxcr4iA@®4F%) in adult mice. 
b, c, Reduced liver fibrosis in Fgfrl IAECHAEC mice, Compared with control mice, 
perisinusoidal enrichment of desmin‘* stellate-like cells (b, white arrow) 

and collagen deposition (c, sirius red staining ) were diminished in the 

liver of Fefr1 iAECHAEC hice after BDL; n = 4; Scale bar, 50 jum in Fig. 4. 

(d-g) endothelial-cell-specific deletion of Fgfrl in mice prevents 
CXCR4-mediated maladaptive transition after BDL and restores regenerative 
angiocrine signals. In BDL-injured LSECs of Fefr1i4®“""© mice, CKCR7 
suppression and CXCR4 upregulation (d) and Erk1/2 activation in 
VE-cadherin* LSECs (e-g) were reduced. This was accompanied by restored 
production of hepatic-active angiocrine factors HGF and Wnt2; n = 4. 

h, Pro-fibrotic production of angiocrine factors in LSECs is reduced in 
Fefr4ECMA4EC mice, BDL instigated divergent production of angiocrine factors 
in LSECs, including upregulation of factors in BMP and TGF-B pathways and 


During chronic liver injury, loss of CKCR7 and upregulation of CKCR4 
in LSECs causes progression to fibrosis. Indeed, the critical function of 
CXCR7 activation in promoting regeneration and counteracting fib- 
rosis is evidenced by the attenuated fibrosis by selective activation of 
CXCR7 in LSECs, as well as by impaired regeneration (Fig. 1) and 
enhanced fibrogenesis (Figs 2 and 3) in Cxcr74"°"4®° mice. 

The shift in SDF-1 signalling from the CXCR7-dependent pro- 
regenerative response to a CXCR4-dominated pro-fibrotic function 
in LSECs is due to persistent FGFR1 that drives chronic MAPK activa- 
tion'!"°?°° (Fig. 3). We used an inducible endothelial-cell-specific 
mouse genetic deletion system to demonstrate the contribution of 
FGFR1-CXCR4 pathway in the pro-fibrotic drift of LSECs. After chronic 
liver injury, the enhanced CXCR4 expression relative to CXCR7 in 
LSECs was prevented in Fefr1'4®°"4° mice, and both Fefr]'8PCMAFC 
and Cxcr44£Ci4EC mice were resistant to fibrosis (Fig. 4). Therefore, 
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vascular niche 
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suppression of anti-fibrotic genes such as follistatin and apelin (Supplementary 
Fig. 19). This pro-fibrotic drift of angiocrine factor in LSECs after BDL was 
mitigated in F yp PAECHIAEC mice; n = 4. i-m, Reduction of liver fibrosis in 
CxcrgiAEChAEC mice after BDL. The extent of liver fibrosis after BDL was 
significantly lower in Cxcr4'“®©"" mice than that of control mice, as 
evidenced by decreased deposition of collagen (i, j, sirius red staining), SMA 
protein (k, l), and perisinusoidal enrichment of desmin * stellate-like cells 
(m, white arrow); n = 5. n, Divergent angiocrine signals from LSECs balance 
liver regeneration and fibrosis. After acute liver injury, activation of 
CXCR7-Id1 pathway in LSECs stimulates production of hepatic-active 
angiocrine factors. By contrast, chronic injury causes persistent FGFR1 
activation in LSECs that perturbs CXCR7-Id1 pathway and favours a 
CXCR4-driven pro-fibrotic angiocrine response, thereby provoking liver 
fibrosis. Therefore, in response to injury, differentially primed LSECs deploy 
divergent angiocrine signals to balance liver regeneration and fibrosis. 


activation of CKCR7-Id1 in LSECs upon injury triggers and safeguards 
production of hepatogenic angiocrine factors, whereas persistent FGFR1- 
CXCR4 activation by chronic stimuli converts LSECs to a pro-fibrotic 
niche. This aberrant angiocrine response of LSECs by the FGFRI1- 
CXCR4 pathway causes activation and expansion of desmin” stellate- 
like cells. Elucidating how hepatic stellate cells reciprocally modulate 
phenotypic and functional contributions of LSECs in liver repair 
remains to be investigated’**. Moreover, whether activation of CKCR7 
and CXCR4 on LSECs regulates inflammatory responses, such as recruit- 
ment of macrophages that could potentially modulate liver regenera- 
tion and fibrosis®*, needs to be studied. 

In summary, we have found that differentially activated LSECs 
supply divergent angiocrine signals for liver repair. Selective activation 
of CXCR7 in LSECs is instrumental in shepherding angiocrine-mediated 
regeneration. Perturbation of the CXCR7 pathway by constitutive FGFR1 
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activation diverts SDF-1 signalling in LSECs to a CKCR4-dominated 
maladaptive (pro-fibrotic) angiocrine response. Taken together, we 
demonstrate that endothelial cells are not just inert cellophane con- 
duits delivering metabolites but also establish a vascular niche that 
instructively dictates regeneration and healing. Identifying molecular 
pathways orchestrating divergent angiocrine responses in the hepatic 
vascular niche will lay the foundation for therapeutic strategy that 
ensures liver repair without causing fibrosis. 


METHODS SUMMARY 


Generation of endothelial-cell-specific inducible gene deletion was performed by 
treating VE-Cad-Cre®*"?/Cah5-PAC-Cre*”-harbouring mice with tamoxifen”®. 
Mice carrying loxP sites flanking Cxcr7 were provided by ChemoCentryx (gener- 
ated by L. Gan (University of Rochester)). Floxed Cxcr4 mice were offered by Y.-R. 
Zou (the Feinstein Institute for Medical Research, Manhasset, New York). Sex-, 
age- and weight-matched littermate animals with indicated genotypes were used 
and compared in all experimental groups. All animal experiments were performed 
under the guidelines set by the Institutional Animal Care and Use Committee at 
Weill Cornell Medical College. 

Single and repeated injections of CCl, were used to induce acute and chronic 
liver injuries, respectively”. To perform BDL, the common bile duct was ligated 
and severed by incision between ligation. Livers were collected for the analysis of 
fibrogenesis, including collagen deposition by Sirius red staining, and desmin 
protein by immunoblot (Abcam). TC14012 (R&D) was injected intraperitoneally 
into the mice after CCl, injury every other day at 30 mgkg_*. LSECs were isolated 
from mice as previously described’. Human LSECs were obtained from ScienCell™ 
Research Laboratories. Tissues were collected and cryopreserved as previously 
described”"’. Antibodies characterizing liver vasculature and fibroblasts were incu- 
bated with cryosections’. Flow cytometry analysis of LSECs was performed on liver 
non-parenchymal cells’. The sample size (n) of each experimental group is 
described in each corresponding figure legend, and all experiments were repeated 
at least three times. Statistical analysis used two-way analysis of variance. All data 
are presented as mean + s.e.m. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 


Endothelial cell (EC)-specific gene deletion strategy. Inducible endothelial-cell- 
specific gene deletion was achieved by treating VE-Cadherin-Cre®®” harbouring 
mice with tamoxifen*!. Cxcr4'?”"? mice were previously described”. The Cre* 
mice were treated with tamoxifen at a dose of 250 mgkg™ 'intraperitoneally for 
6 days interrupted for 3 days after the third dose. After 3 weeks of tamoxifen 
treatment, deletion of target genes in LSECs was corroborated by quantitative 
PCR and immunoblot analysis. All animal experiments were performed under 
the guidelines set by the Institutional Animal Care and Use Committee at Weill 
Cornell Medical College, using sex-, age- and weight-matched littermate animals. 
Liver injury and fibrosis models. Single and repeated injections of CCl, were 
used to induce acute and chronic liver injuries, respectively, as previously described”’. 
CCl, was diluted in oil to yield a final concentration of 40% (0.64 mg ml‘) and 
injected into mice at 1.6mgkg ' body mass. Eight- to ten-week-old mice were 
subjected to BDL. Acute liver injury was also induced in mice by intraperitoneal 
injection of 400 mg kg” ' acetaminophen. To perform BDL, mice were subjected 
to a mid-abdominal incision 3 cm long, under general anaesthesia. The common 
bile duct was ligated in two adjacent positions approximately 1 cm from the porta 
hepatis. The duct was then severed by incision between the two sites of ligation. 

To selectively activate CXCR7, the agonist TC14012 (R&D Systems) was intra- 
peritoneally injected into the mice after CCl, injury or BDL injury every other day 
at 30mgkg *. At indicated time points, mice were killed and whole liver tissues 
were collected for the analysis of fibrogenesis, including collagen deposition by 
Sirius red staining and deposition of SMC and collagen I protein detected by 
immunoblot (Abcam). 

Isolation and culture of mouse liver cells. Liver cells were isolated from mice by a 
two-step collagenase perfusion technique with modifications, as previously described’. 
Briefly, the liver was perfused with Liver Perfusion Medium (Invitrogen), and 
dissociated by Liver Digest Medium (Invitrogen). The non-parenchymal cells were 
fractionated with percoll gradient centrifugation with 75% stock Percoll solution 
and 35% stock Percoll solution. LSEC faction was isolated by mouse LSEC binding 
magnetic beads (Miltenyi) and Dynabeads Magnetic Beads conjugated with anti 
mouse-VEGFR3 antibody (Imclone, NY)’. Expression of Id1, CKCR4, CXCR7 and 
FGER1 protein and mRNA were determined from isolated LSECs"'. For detection 
of FRS-2 phsophorylation in LSECs, mice were perfused with phosphatase inhib- 
itor before collecting the tissues (Pierce)"’. 

Culture and stimulation of human LSECs. Human LSECs obtained from 
ScienCell™ Research Laboratories were cultured following vendor’s instruction. 
The expression of CXCR7, VE-cadherin, vWF and factor VIII was validated by 
immunostaining or flow cytometric analysis. To selectively knockdown Cxcr4, 
Cxcr7 in LSECs, shRNA Lentiviruses were generated by cotransfecting 15 1g of 
shuttle lentiviral vector containing target gene or scrambled shRNA, 3 pg of pPENV/ 
VSV-G, 54g of pRRE and 2.5 11g of pRSV-REV in 293T cells by the calcium 
precipitation method. Viral supernatants were concentrated by ultracentrifuga- 
tion and used to transduce human LSECs. 

To determine the expression of Id1, CKCR4 and CXCR7 in LSEC after cytokine 
stimulation, 500,000 LSECs were seeded and treated with Cxcr4, Cxcr7 or scrambled 
shRNA lentiviruses, respectively. After starving in serum-free medium, seeded 
LSECs were stimulated with 10 ngml~' SDF-1 or 20ng ml FGF-2. At various 
time points, cells were collected for the measurement of Id1 protein and mRNA 
expression. Treatment with 30 11M U0126 was used to inhibit the activity of MAPK. 
Activation of MAPK (p-Erk1/2) was assayed by immunoblot using antibodies 
against p-Erk1/2 and total Erk1/2 (Cell Signaling Technology)". 

For immunoprecipitation—western blot, cell lysates were retrieved by RIPA lysis 
buffer with protease inhibitor cocktail and phosphatase inhibitor (Pierce) and 
incubated with anti-CXCR7 antibody (R&D Systems) conjugated with Protein 
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A/G beads (Invitrogen). Beads were retrieved by magnet, associated proteins were 
eluted, and the association of f-arrestin, CXKCR4, and CXCR7 was determined by 
western blot (Santa Cruz), after normalization to total CXCR7 protein amounts in 
cell lysates (input). 

Flow cytometric analysis of liver non-parenchymal cells and LSECs. For flow 
cytometry analysis, retrieved livers from animals were minced, dissociated in liver 
digestion medium (Invitrogen), and filtered through a 30-1m strainer. Single-cell 
suspensions were preblocked with Fc block (CD16/CD32; BD Biosciences) and 
then incubated with the following primary antibodies recognizing mouse LSECs 
and haematopoietic cells,: rat IgG2ax and IgG2af isotype control; CD31/PECAM- 
1 (clone MEC 13.3, eBioscience); VE-cadherin/CD 144 (clone Bv13, eBioscience); 
CXCR7 (clone 11G8, R&D Systems). Primary antibodies were directly conjugated 
to various Alexa Fluor dyes or Quantum Dots using antibody labelling kits (Invitrogen) 
performed as per the manufacturer’s instructions. In the case of Alexa Fluor 750, 
conjugations were performed using succinimidy] esters and purified over BioSpin 
P30 Gel (Bio-Rad). 

Labelled cell populations were measured by a LSRII flow cytometer (Becton 
Dickinson); compensation for multivariate experiments was performed with 
FACS Diva software. Flow cytometry analysis was performed using a variety of 
controls such as isotype antibodies, and unstained samples for determining appro- 
priate gates, voltages, and compensations required in multivariate flow cytometry. 
Immunostaining and histological analysis of liver cryosection. To collect tis- 
sues for histological analysis, mice were perfused with 4% PFA, cryopreserved, and 
snap frozen in OCT. For immunofluorescent microscopy, the liver sections 
(10 ttm) were blocked (5% donkey serum/0.3% Triton X-100) and incubated in 
primary Abs: anti-VE-cadherin polyclonal Ab (pAb, 2 pg ml~', R&D Systems), 
anti-CD31 mAb (MEC13.3, 5 pg ml ', BD Biosciences), anti-CXCR7 mAb (5 ug ml’, 
R&D Systems), anti-desmin (pAb, 2 ug ml *, Abcam) and anti-p-Erk1/2 antibody 
(2 ug ml ~*, Cell Signaling Technology). After incubation in fluorophore-conjugated 
secondary antibodies (2.5 pg ml ~ 1 Jackson ImmunoResearch), sections were coun- 
terstained with TOPRO3 or DAPI (Invitrogen). 

Liver cell proliferation in vivo was measured by BrdU uptake. Briefly, mice 
received a single dose of BrdU (Sigma) intraperitoneally 60 min before death 
(50mgkg *). Liver lobes were removed, weighed and further processed. Slices 
were preincubated with 1M HCl] at room temperature for 1h, neutralized with 
10 mM Tris (pH 8.5) at room temperature for 15 min, and stained using the BrdU 
Detection System (BD Biosciences) and fluorophore-conjugated secondary antibodies 
(2.5 ug ml 1 Jackson ImmunoResearch). For immunohistochemical (IHC) detec- 
tion of BrdU, endogenous peroxidase and nonspecific protein block (5% BSA, 10% 
donkey serum and 0.02% Tween-20) were performed on liver cryosections and 
incubated with secondary pAb and streptavidin horseradish peroxidase (Jackson 
ImmunoResearch). 

Image acquisition and analysis. IHC staining of liver slides was captured with 
Olympus BX51 microscope (Olympus America), and fluorescent images were 
recorded on AxioVert LSM710 confocal microscope (Zeiss). Co-staining of VE- 
cadherin with CXCR4 and CXCR7 was also determined. 

Gene expression analysis by real-time polymerase chain reaction. Total RNA 
was extracted from cryopreserved liver tissue or isolated LSECs using RNeasykit 
(Qiagen). After isolation, 500 ng of total RNA was transcribed into complement- 
ary DNA by using the superscript reverse transcriptase Kit (Invitrogen). The 
detection of complementary DNA expression for the specific genes was performed 
using quantitative polymerase chain reaction. 


31. Wang, Y. et al. Ephrin-B2 controls VEGF-induced angiogenesis and 
lymphangiogenesis. Nature 465, 483-486 (2010). 

32. Nie, Y. etal. The role of CXCR4 in maintaining peripheral B cell compartments 
and humoral immunity. J. Exp. Med. 200, 1145-1156 (2004). 
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Human body-surface epithelia coexist in close association with 
complex bacterial communities and are protected by a variety of 
antibacterial proteins. C-type lectins of the RegIII family are bac- 
tericidal proteins that limit direct contact between bacteria and the 
intestinal epithelium and thus promote tolerance to the intestinal 
microbiota’’. RegIII lectins recognize their bacterial targets by 
binding peptidoglycan carbohydrate’, but the mechanism by which 
they kill bacteria is unknown. Here we elucidate the mechanistic 
basis for RegIII bactericidal activity. We show that human RegIIla, 
(also known as HIP/PAP) binds membrane phospholipids and kills 
bacteria by forming a hexameric membrane-permeabilizing oligo- 
meric pore. We derive a three-dimensional model of the RegIIIa 
pore by docking the RegIIa crystal structure into a cryo-electron 
microscopic map of the pore complex, and show that the model accords 
with experimentally determined properties of the pore. Lipopoly- 
saccharide inhibits RegIIIa pore-forming activity, explaining why 
ReglIIla is bactericidal for Gram-positive but not Gram-negative 
bacteria. Our findings identify C-type lectins as mediators of mem- 
brane attack in the mucosal immune system, and provide detailed 
insight into an antibacterial mechanism that promotes mutualism 
with the resident microbiota. 

ReglIIx damages the surfaces of Gram-positive bacteria’, indicating 
that RegIIIa might target bacterial membranes. We assessed the capa- 
city of RegIIIa to permeabilize bacterial membranes by quantifying 
bacterial uptake of a membrane-impermeant fluorescent dye (SYTOX 
green). RegIIIa increased SYTOX green uptake when added to the 
Gram-positive species Listeria monocytogenes, indicating damaged 
membranes (Fig. la, b). RegIIIa has an anionic amino-terminal pro- 
segment that inhibits bactericidal activity (but not peptidoglycan bind- 
ing) by docking to the protein core through charge—charge interactions*. 
The pro-segment is removed by trypsin on secretion into the intestinal 
lumen, yielding bactericidally active ReglIIIu (ref. 4). Bactericidally 
inactive pro-RegIII« did not induce SYTOX green uptake, indicating 
minimal membrane permeabilization (Fig. 1a). Thus, RegIIIo permea- 
bilizes the bacterial membrane, and the pro-segment inhibits this activity. 

To test directly for membrane disruption by RegIII% we used lipo- 
somes composed of 85% zwitterionic phospholipid (PC) and 15% acidic 
phospholipid (PS). The liposomes encapsulated carboxyfluorescein, a 
fluorescent dye. RegIIIa induced rapid dye efflux from PC/PS lipo- 
somes (Fig. 1c), which was reduced when PC-only liposomes were used 
(Fig. 1d, e). This indicates a preference for acidic phospholipids that is 
consistent with the acidic lipid content of bacterial membranes° and 
with the salt sensitivity of RegIIIa membrane toxicity (Extended Data 
Fig. 2a, b). These findings indicate that RegIIIo interactions with lipid 
bilayers are mediated by electrostatic interactions. pro-RegIIIa yielded 
a diminished rate of dye release (Fig. 1f), indicating that the pro-segment 
inhibits membrane permeabilization. 


We next assessed RegIIIo lipid-binding activity by measuring changes 
in the intrinsic fluorescence of tryptophan residues’. We observed increased 
tryptophan fluorescence intensity only when ReglIIIa was added to 
PS-containing liposomes (Fig. 1g-i), indicating that RegIIIo interacts 
with acidic phospholipids. Furthermore, we observed fluorescence res- 
onance energy transfer (FRET) between donor ReglIIa tryptophan 
residues and dansyl-labelled PC/PS liposomes’ (Fig. 1j, k). FRET was 
inhibited by the pro-RegIIIa N-terminal pro-segment (Fig. 1j, k), indi- 
cating that the pro-segment inhibits bactericidal activity by hindering 
lipid binding. Consistent with its inability to bind lipids, pro-RegIIIa 
did not inhibit RegIII« bactericidal activity in mixing experiments 
(Extended Data Fig. 2c). 

Several membrane-active toxins destabilize membranes by forming 
monomeric or multimeric pores*. To test for RegIIIa pores, we per- 
formed conductance studies in black lipid membranes, a model system 
that mimics the properties of a cell membrane’. RegIII« produced rapid 
single-channel-like currents at —80 mV in the presence of Mg” ions 
(Fig. 2a), with no current detected at 0 mV. Using the Nernst-Planck 
equation we estimated the diameter of the pore at ~ 12-14 A (Extended 
Data Fig. 3). The calculated pore size agreed with the lack of efflux 
of fluorescein isothiocyanate- dextran-10 (FD10) or FD4, which have 
Stokes diameters of ~44 A and ~28 A, respectively (Fig. 2b). In con- 
trast, carboxyfluorescein (~10 A) passed readily through the pores 
(Figs 1c and 2b). These results show that ReglIIIo forms functional 
transmembrane pores and yield an estimate of the inner pore diameter. 

When visualized by negative-stain electron microscopy (EM), numer- 
ous circular structures of ~100 A diameter were observed on liposomes 
incubated with RegIIIu (Fig. 2c and Extended Data Fig. 4a). Although 
ReglIIo is a monomer in solution”®, the size of the pores suggested that 
they were RegI IIx multimers. We therefore treated liposome-associated 
ReglIIo with a crosslinking agent, solubilized the products in detergent, 
and separated them by size-exclusion chromatography (Fig. 2d). In addi- 
tion to a prominent monomer peak we detected a second, liposome- 
dependent peak at a lower retention volume, indicating the formation 
of a multimeric complex. Western blotting showed a single RegIIIa 
species with mobility similar to that predicted for a hexamer (Fig. 2d), 
suggesting that the pore was a RegI IIa hexamer. 

After longer incubations with lipid, RegIIIo formed filaments (Extended 
Data Fig. 4b) similar to those in pancreatic secretions''. The filaments 
were ~100A in diameter, correlating with the dimensions of the 
ReglIIa pore (Fig. 2c). ReglIIo filamentation required lipid and was 
dependent on ReglII« pore formation, as pro-ReglIII« formed neither 
pores nor filaments (Extended Data Fig. 4b, d). Filamentation partially 
inhibited the ability of RegIIIa to permeabilize membranes (Extended 
Data Figs 4c and 5a-c), as observed with other membrane toxic host 
defence proteins where filamentation traps pore complexes and limits 
damage to host cells'*. These findings indicate that the RegIIIo filaments 
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Figure 1 | RegIIIa permeabilizes the bacterial membrane. a, Listeria 
monocytogenes was treated with 25 1M ReglIIo, pro-ReglIIIo, or BSA or left 
untreated, and bacterial uptake of SYTOX green was measured. Results are 
representative of three independent experiments, and are expressed as a 
percentage of maximum SYTOX uptake in the presence of 0.2% SDS. b, SYTOX 
green uptake by L. monocytogenes in the presence of increasing RegIIIa 
concentrations. Assays were performed in triplicate. Means + s.e.m. are 
plotted. c, Carboxyfluorescein (CF)-loaded liposomes (10 uM lipid; 85% PC/ 
15% PS) were treated with 1 uM Regl IIa. 1.0% octylglucoside (OG) was added 
towards the end to disrupt remaining liposomes. Dye efflux is expressed as 
percentage of maximal release by detergent. Results are representative of five 
independent experiments. d, 10 1M ReglIIo was added to carboxyfluorescein- 
loaded liposomes (100 1M lipid; 100% PC, 100% PS or 85% PC:15% PS), and 
dye efflux was monitored over time. Representative results are shown. 


are higher-order assemblies of RegIIIa pore complexes and show that 
filamentation limits RegIIIo toxicity. 

Although the ~90-kDa ReglII~ pore complex was too small for 
structure determination by single-particle cryoelectron microscopy 
(cryoEM) methods’, the ReglII« filaments were sufficiently large 
for such analysis. We therefore reconstructed a three-dimensional 
map of the RegIIIo filament and extracted the structure of the minimal 
pore complex (Fig. 3a, b and Extended Data Fig. 6a-f). The nominal 
resolution of our structure, 9.2 A, was limited by symmetry variability 
and filament bending (Extended Data Fig. 6g-j and Supplementary 
Information). Consistent with our crosslinking studies (Fig. 2d), the 
minimal pore was a hexamer formed by three RegIIIu dimers related 
by helical symmetry. The outer diameter of the pore assembly was 
89 A, as observed by negative-stain EM (Fig. 2c). The pore height 
was 55A, sufficient to span a lipid bilayer (35-45 A)". The inner 
diameter was ~18 A, consistent with the pore size predicted by our 
conductance measurements (Extended Data Fig. 3) and dye release 
assays (Fig. 2b). 
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e, Averaged results from three independent replicates of the experiment 
shown in d. NS, not significant; **P < 0.01; ***P < 0.001. f, Initial rate of 
liposome dye efflux (100 1M lipid) as a function of RegIIIo and pro-ReglIIIo 
concentration. Results are representative of three independent experiments. 
*P < 0.05; **P < 0.01. g, Intrinsic tryptophan fluorescence of 1 1M Regl IIa 
was measured in the presence of increasing lipid concentrations. h, Tryptophan 
fluorescence of 1 1M ReglIIx and pro-ReglIIx as a function of lipid 
concentration. i, Intrinsic tryptophan fluorescence of 1 uM ReglIIo was 
measured in the presence of liposomes (100 1M lipid) of varying lipid 
composition. j, 5.0 uM Regl IIo or pro-RegIII« was added to liposomes (100 uM 
lipid) incorporating 5% dansyl-PE and dansyl fluorescence was monitored. 
Assays were performed in triplicate. k, FRET efficiency as a function of RegIIIo 
and pro-ReglIIo concentration. Assays were performed in triplicate. 

Means + s.e.m. are plotted. 


ReglIlIq, like other epithelial bactericidal proteins such as «-defensins, 
is constrained by disulphide bonds that prohibit large secondary struc- 
ture changes on moving from an aqueous to an apolar environment’*”’. 
This suggested the feasibility of docking the three-dimensional structure 
of the RegIIIo monomer into the EM density map to model the organ- 
ization of the pore complex further. First, we determined the crystal 
structure of processed, bactericidally active RegIIIx (Extended Data 
Fig. 7a) and compared it to the previously determined structure of 
bactericidally inactive pro-ReglIIIx. The two structures were similar, 
although the amino acid side chains of the loop encompassing residues 
93-99 (sequence KSIGNSY) adopted different orientations in the active 
RegllIa structure (Fig. 3c). This was consistent with the conforma- 
tional flexibility of this loop as indicated by a higher crystallographic 
B-factor (Extended Data Fig. 7b). 

The active ReglIIIa structure could be docked into the cryo-EM 
hexameric density map (Fig. 3d and Extended Data Fig. 6k, 1), provid- 
ing good spatial constraints for building a hexameric model. The 
model indicates that the RegIII« subunits in the pore assembly are 
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Figure 2 | RegIIIa forms a transmembrane pore. a, RegIIIa-dependent 
current flow across a planar lipid bilayer is depicted as a function of time. 

No current was observed before the application of a voltage across the 
membrane. Upon the application of —80 mV, inward current was observed, 
and returning the membrane potential to zero diminished the current because 
the measured reverse potential was —4.0 mV. The current trace is 
representative of multiple independent experiments. b, Liposomes loaded with 
FITC-Dextran 10 (FD10), FITC-Dextran 4 (FD4), or carboxyfluorescein (CF) 
were treated with 5.0 1M RegIIIo and dye release was monitored over time. 
1% OG (octylglucoside) was added to disrupt the liposomes towards the end of 
the experiment. c, Negative-stain electron microscopy (EM) images of RegIIIo 
in the presence of lipid bilayers. An individual RegIII« pore particle is shown 
in the right-hand panel. d, RegIIIo (100 1M) in the presence or absence of 
liposomes (1 mM lipid) was crosslinked with 5 mM 1-ethyl-3-[3- 
dimethylaminopropyl] carbodiimide hydrochloride (EDC). Crosslinked 
complexes were solubilized in detergent, resolved by size-exclusion 
chromatography, and analysed by western blotting with anti-ReglIII antibody. 
The predicted mobilities of RegIIIo dimers, tetramers and hexamers were 
calculated from the mobility of the monomer after crosslinking in the absence 
of liposomes (right panel in blot). 


oriented with the carbohydrate-binding loop pointing towards the 
central channel, and the loop encompassing residues 93-99 and the 
N and carboxy (C) termini oriented towards the lipid bilayer (Fig. 3d). 
The resolution of our map did not allow us to extract detailed informa- 
tion about intermolecular interactions in the pore complex. There was 
imperfect docking of the carbohydrate-binding loop, the loop encom- 
passing residues 93-99, and the far N terminus (Fig. 3d), consistent with 
the conformational flexibility of these regions (Extended Data Fig. 7b). 

We used mutagenesis to assess experimentally the orientation of 
RegII« in the pore complex. Our model predicts that the basic residue 
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Lys 93 is oriented towards the lipid bilayer (Fig. 3d) and thus might be 
involved in interactions with the negatively charged phospholipids 
required for RegIIIa-liposome interactions (Fig. 1d, e). A Lys93Ala 
mutation, but not conservative Lys93Arg and Lys93His mutations, 
reduced the toxicity of RegIII« for liposomes as well as intact bacteria 
(Fig. 3e, fand Extended Data Fig. 8a). In contrast, a Glul114Gln muta- 
tion, which resides in the carbohydrate-binding loop (Fig. 3c)’, did not 
have an impact on membrane toxicity, consistent with its predicted 
position near the pore interior (Fig. 3d, e). As expected, the Lys93Ala 
mutation but not the Glul114GIn mutation inhibited filament forma- 
tion (Extended Data Fig. 8b). Finally, the orientation of the N terminus 
towards the lipid bilayer is consistent with the role of the N-terminal 
pro-segment in inhibiting RegIIIo interactions with lipid and reducing 
membrane toxicity (Fig. la, f, h, j, k). 

We next calculated the energetics of pore insertion into a PC-like mem- 
brane bilayer using physics-based computational modelling (Extended 
Data Fig. 9a-d)'’. The model predicts that basic residues are located 
near the membrane-water interface whereas a strip of hydrophobic 
and polar residues is buried in the membrane core (Fig. 3g). The 
complex presents a positive electric field to the membrane (Extended 
Data Fig. 9e, f), creating an unfavourable electrostatic energy unless 
negatively charged PS-like lipids are added to the membrane (Fig. 3h). 
This is consistent with our finding that PS lipids are necessary for 
ReglIIa toxicity (Fig. 1d, e). Finally, calculations on the Lys93Ala 
mutant showed reduced stability (Fig. 3h) due to loss of favourable 
electrostatic interactions between Lys 93 and negatively charged lipids. 
Thus, the model reveals that charge sequestration is a critical deter- 
minant of RegIII« pore stability in the membrane. Furthermore, the 
model predicts that Arg166 interacts with the membrane surface 
(Extended Data Fig. 10a). Consistent with this prediction, an Arg166Ala 
mutation reduced membrane toxicity of ReglIIa (Extended Data 
Fig. 10b). In contrast, mutating Arg 39, which is exposed to aqueous 
solvent in the model, had little effect on RegIIIa membrane toxicity 
(Extended Data Fig. 10a, b). Thus, our model accurately predicts the 
experimental behaviour of the RegIII« pore. 

RegllIa selectively targets Gram-positive bacteria’, raising the ques- 
tion of why RegIIIo cannot kill Gram-negative bacteria by permeabi- 
lizing the outer membrane. In contrast to PC/PS liposomes, liposomes 
composed of an Escherichia coli total lipid extract were not disrupted 
by ReglIIa (Fig. 4a), indicating that a component of the lipid extract 
inhibited membrane permeabilization. Lipopolysaccharide (LPS), a major 
constituent of the Gram-negative outer membrane, inhibited RegIIIa- 
mediated liposome disruption and antibacterial activity (Fig. 4b, c), 
indicating that LPS is one factor that prevents RegIIIa-mediated per- 
meabilization of Gram-negative bacteria. 

Finally, we postulated that the trypsin-cleavable inhibitory N ter- 
minus of pro-RegIIIa evolved to suppress pore-forming activity and 
thus minimize cytotoxicity during ReglIIx synthesis and storage in 
epithelial cells. In support of this idea, RegIIIx was cytotoxic towards 
cultured intestinal epithelial cells (MODE-K)’*, and the pro-segment 
suppressed this cytotoxicity (Fig. 4d, e). 

Thus, ReglII« kills its bacterial targets by oligomerizing on the bac- 
terial membrane to form a membrane-penetrating pore (Extended 
Data Fig. 1). Membrane attack by pore formation represents a previ- 
ously unappreciated biological activity for the C-type lectin family. Our 
findings may provide insight into the evolutionary origins of the lectin- 
mediated complement pathway, in which recruited complement pro- 
teins disrupt microbial membranes’’. With its intrinsic capacity for 
membrane attack, RegIII« may represent a more evolutionarily prim- 
itive mechanism of lectin-mediated innate immunity. We propose that 
the lectin-mediated complement pathway could have evolved from a 
directly bactericidal ancestral lectin, with the bacterial recognition 
function retained by the descendent C-type lectin(s) and the mem- 
brane attack function assumed by recruited accessory proteins that 
assemble into the membrane attack complex. 
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Figure 3 | Structural model of the RegIIIa pore complex. a, Top and side 
view of the cryoEM reconstruction of the RegIII« filament. b, Top and side view 
of the cryoEM map of the RegIIIo hexameric complex at a nominal 9.2 A 
resolution. c, Ribbon representation of the crystal structure of active 
monomeric ReglII« (Protein Data Bank (PDB) code 4MTH), aligned with 
the pro-ReglIIo structure (PDB code 1UV0). The first ten residues of the 
N-terminal pro-segment are disordered and are therefore missing from the 
structure; these residues have been depicted as a dashed red line. Side chains in 
the loop encompassing amino acids 93-99 (KSIGNSY) are shown as sticks. 
d, Stereo diagram showing docking of the active RegIII« crystal structure into 
the cryoEM density map. The docked structures are alternately coloured blue 
and cyan to aid in visualization of the individual subunits. The positions of 
Lys 93 (K93) and Glu 114 (E114) are indicated. e, 5 1M of wild-type (WT), 
Lys93Ala (K93A) mutant, or Glul14GIn (E114Q) mutant RegIIIo was added to 
100 uM carboxyfluorescein-loaded liposomes and dye efflux was monitored. 
f, 1 uM wild-type or Lys93Ala mutant RegIIIa was assayed for membrane 
disruption in bacteria using the SYTOX uptake assay described in Fig. 1. Assays 
were performed in triplicate and results are expressed relative to wild-type 
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ReglIl. Error bars indicate s.e.m.; **P < 0.01. g, Most energetically stable 
membrane configuration around the embedded hexamer. The upper 
membrane boundary (grey surface) bends down to expose large charged 
portions of the protein to water, whereas the lower membrane boundary (grey 
surface) exhibits minor deflections. The region between the upper and lower 
boundaries is a water-inaccessible region composed of the high-dielectric head- 
groups and the low-dielectric core. A stretch of hydrophobic residues (yellow) 
is in the centre of the membrane, whereas charged (basic in blue and acidic in 
red) and polar (green) residues are near the upper and lower membrane 
boundaries in the high-dielectric head-group region. h, Using the configuration 
in g, we added negatively charged point charges to the head-group regions to 
model addition of PS lipids (red dots in the inset model). At low values, the total 
insertion energy for the wild-type protein is positive, indicating a lack of 
stability, but above 10 negatively charged lipids, the hexamer is stabilized in the 
membrane (black curve). The optimal lipid configuration is indicated by an 
asterisk. The insertion energy for the Lys93Ala mutant is in red. Inset: top- 
down view; red dots, PS lipids; blue, Arg and Lys residues; white dots, 
uncharged lipid positions. 


Figure 4 | Regulation of RegIIIa pore formation. a-c, RegIIIa pore 
formation is inhibited by lipopolysaccharide. a, 10 1M ReglIIo was added to 
liposomes composed of lipids from an E. coli total lipid extract or from PC/PS as 
a control. b, 10 UM RegIIIa was added to liposomes (100 [1M lipid) in the 
presence of varying LPS concentrations. c, 10 1M RegIIIa was added to 

~10* c.f.u. of log phase L. monocytogenes in the presence of varying LPS 
concentrations. The assay was carried out at 37 °C for 2h, and surviving 
bacteria were quantified by dilution plating. Assays were done in triplicate. 
Results in a—c are representative of two independent experiments. d, e, The 
ReglIIIo, N-terminal pro-segment limits toxicity towards mammalian cells. 

d, RegIIIo was added to MODE-K cells and cytotoxicity was determined by 
quantifying lactate dehydrogenase (LDH) release. LDH activity was assessed by 
spectrophotometric detection of an enzymatic product of LDH at 492 nm. 

e, 10 1M Regl Ilo or pro-ReglIIIo. was added to MODE-K cells and LDH release 
was quantified. Maximum LDH release was determined by treating cells with 
NP-40 detergent. 
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METHODS SUMMARY 

Preparation of recombinant RegIIIa. Recombinant human pro-ReglIIa and 
ReglIIo were expressed and purified according to published methods". 
Membrane permeabilization assays. Listeria monocytogenes was exposed to 25 UM 
ReglIIa, pro-ReglIIx or bovine serum albumin (BSA), incubated with SYTOX 
green, and cell-associated fluorescence was quantified. For dye leakage assays, 
fluorescence of carboxyfluorescein-loaded liposomes was monitored over time 
ona PTI spectrofluorometer, in the presence or absence of RegIII« or pro-RegIII«. 
Lipid binding assays. Binding of ReglIIa and pro-ReglII% to liposomes was 
measured by monitoring fluorescence resonance energy transfer (FRET) between 
protein tryptophan residues and dansyl-PE. Fluorescence spectra were recorded 
ona PTI Spectrofluorometer. Measurements of intrinsic tryptophan fluorescence 
of RegIIIx in the absence or presence of liposomes were recorded on a PTI 
Spectrofluorometer between 290 and 450 nm at a fixed excitation wavelength of 
280 nm. 

Crosslinking experiments. RegIII« was incubated with liposomes for 20 min 
followed by 1h treatment with 5 mM of the crosslinking reagent, EDC, at room 
temperature. The samples were solubilized with 40 mM n-decyl-B-b-maltopyranoside 
(DM) detergent, separated by size-exclusion chromatography, and analysed by 
western blotting with detection by anti-ReglIIy antibody’®. 

Determination of the RegIIIa crystal structure. Recombinant ReglIIo. lacking 
the N-terminal pro-segment was crystallized using the sitting-drop vapour dif- 
fusion method. We collected X-ray diffraction data at the Advanced Photon 
Source, Argonne National Laboratory. The structure was determined by molecular 
replacement using a starting model of the full-length RegIII« structure, followed 
by cycles of model building. Further details are available in Supplementary 
Information. 

CryoEM imaging. Images were acquired in a JEOL JEM2200FS FEG transmission 
electron microscope equipped with an in-column energy filter. Full details are 
available in Supplementary Information. 

Computational modelling studies. Full details are available in Supplementary 
Information. 

Statistical analysis. All P values were calculated using the unpaired, two-tailed 
t-test. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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Extended Data Figure 1 | Model of RegIIIa bactericidal function. An overall 
model that incorporates both the peptidoglycan and lipid-binding functions 
of ReglIIo is depicted. Combining our current and previous findings, we 
propose that RegIIIo recognizes and kills its bacterial targets in two distinct 
steps. First, RegIIIx is secreted from epithelial cells as a soluble monomer that 
recognizes Gram-positive bacteria by binding to peptidoglycan carbohydrate 
via an EPN motif located in the long loop region’*. Second, RegIII« kills 
bacteria by oligomerizing in the bacterial membrane to form a hexameric 
membrane-penetrating pore that is predicted to induce uncontrolled ion efflux 
with subsequent osmotic lysis. The inhibitory N terminus of pro-RegIIIo. 
hinders lipid binding and consequently suppresses pore formation until it is 
removed by trypsin after secretion into the intestinal lumen’. We propose that 
the inhibitory N-terminal peptide evolved to minimize collateral damage from 
the RegI IIa pore-forming activity during ReglIII« storage in the membrane- 
bound secretory granules of epithelial cells. In support of this idea, RegIIIo 
damages mammalian cell membranes and the N-terminal pro-segment limits 
this toxicity (Fig. 4d, e). 
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Extended Data Figure 2 | Characterization of RegIIIa membrane 
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experiment shown in a. c, Pro-RegIII« does not inhibit RegIIIo bactericidal 


permeabilization activity. a, b, Impact of NaCl concentration on RegIIIa activity. 10 uM of purified recombinant pro-ReglIIo, RegIIIo, or a combination 
membrane permeabilization activity. a, 10 1M ReglIIa was added to liposomes _ of the two was added to ~10° c.f.u. of L. monocytogenes for 2h at 37 °C. 

(100 uM lipid) in the presence of varying NaCl concentrations. Representative Surviving bacteria were quantified by dilution plating. 

results are shown. b, Averaged results from three independent replicates of the 


©2014 Macmillan Publishers Limited. All rights reserved 


LETTER 


1800 


371 i1+i2 


2*1142*i2 


0 
-280 -240 -200 -160-120 -80 -40 1 
amplitude (pA) 


Extended Data Figure 3 | RegIIIa forms a transmembrane pore. Analysis of 
Regl IIa conductance in lipid bilayers. The trace of a typical single channel 
recording gave rise to the event histogram shown here. At —80 mV, there was a 
short latency before the first opening event, which led to the baseline current of 
—6.5pA at —80 mV. The baseline current was subtracted so that the baseline 
corresponds to a peak at 0 pA. Once we assigned two basic peaks at —53 pA and 
—81 pA as two independent opening events (i1 and 2), all the other major 
peaks in the histogram are linear combinations of these two basic events (as 
labelled). The data therefore suggested two different scenarios. One is that there 
are three pores, and each pore has two different conducting states, which may 
reflect the flexible diameter of the pore. The other is that i1 and i2 reflect two 
different pores that have different diameters, and that there are at least five 
different channels in the membrane to produce the observed histogram. This 
second scenario correlates with the observed variability in helical symmetry. 
With the idea of variability and protein dynamics in mind, it is likely that the 
two types of pores may interconvert with each other in the membrane. From the 
basic events, we estimated the pore diameters by applying the Nernst-Planck 
equation. In the experimental conditions, our recording chambers had 
150mM K*,25mM Na*,215mM Cl ,20mM Mg** and 10mM MES pH5.5 
in the cis side, and 20 mM K*, 25mM Na”, 45mM Cl” and 10mM MES 
pH5.5 in the trans side. The reversal potential (Ex, Ex, Eq) and Eyyzs) for 
each ion could be calculated (Ex = 50.9 mV, En, = OmV = Ems, and 

Eq = —39.5 mV). In the trans side, there is a trace amount of Mg”* (~10 pM), 
which gives a reversal potential Ey of 92 mV. Our dye leakage assay showed 
that the pore was open at Vinem = 0mV transmembrane potential, ruling out 
significant voltage-dependent gating of the RegIIIo channel. On the basis of 
the ion replacement studies we did for different ions, we estimated the relative 
permeability of different ions to be: Px = Pxa = 1.0; Po) = 0.85; Parzs = 0.73 
and Py = 0.66. The measured relative permeation rates showed that the pore 
has very weak cation selectivity, and favours K*/Na* over Mg** due to the 
charge density difference. Under the same assumption, the average 
conductance (<g>) of the two basic opening events (i1 and i2) could be 
calculated as the following: 


<g> = i 
2Pion( Vimem — Eion) 


The two calculated conductance levels of 100 pS and 152 pS were then 
entered into the Nernst-Planck equation for electrodiffusion and gave rise 
to an approximate estimate of the pore diameter of 12 A and 14 A, respectively, 
which is in good agreement with the observed pore size in the reconstructed 
three-dimensional structure of the pore (Fig. 3b). A more rigorous 
calculation of the ion flux is possible with a high-resolution picture 

of the potential profile, but is beyond the scope of this paper. 


©2014 Macmillan Publishers Limited. All rights reserved 


_no protein control no liposome control 


100 nm 


Regllla+no liposomes 2 min 


LETTER 


v 


100 nm 


20 min 20 min + anti-Reglllo 


cryoEM 


Regllla+liposomes 


Reglllc (wt) Regllla. (VH) 


38 IRCPKGSKAYGSHC...CKFTD 175 wild-type RegIIIa 
38 IRCPKGSKAYGSHC...CKVH 174 mutant RegIIIa 


Extended Data Figure 4 | Analysis of liposome-associated RegIIIa. by 
electron microscopy. a, Negative staining EM controls lacking RegIIIx or 
liposomes are shown. b-d, RegIII« pore complexes assemble into filaments. 
b, RegIIIo, forms filaments in the presence of lipid vesicles. 20 1M ReglIIa was 
incubated for 2 or 20 min with vesicles composed of PC/PS (85%:15%). Samples 
were visualized by transmission electron microscopy. Grids were stained with 
anti-ReglII antibody’”° to confirm that the filaments were composed of 
Regllla. Filamentation required membranes, as no filaments were observed in 
the absence of liposomes. Arrows indicate examples of filaments in each image. 
c, 20 UM Regl Ilo carrying a mutation near the C terminus (C-terminal 
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sequence: FTD (wild-type) >VH (mutant)) was incubated for 20 min with 
unilamellar vesicles and visualized by cryoEM and negative-staining EM. The 
results demonstrate that the VH mutant retains the ability to form pores in lipid 
bilayers but cannot form filaments. A comparison of the wild-type and mutated 
C terminus is shown below. d, Quantification of filament formation by 20 1M 
pro-ReglIla, wild-type (wt) and C-terminal mutant (VH) ReglIIa in the 
presence of vesicles. Results are representative of counts from three different 
areas. nd, not detected. The results show that pro-ReglIIIa, which cannot form 
pores, also cannot assemble into filaments. 
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Extended Data Figure 5 | Filament formation inhibits RegIIIa membrane 
toxicity. We examined the functional properties of the RegIIIa VH mutant 
carrying a mutation near the C terminus (C-terminal sequence: FTD (wild- 
type) VH (mutant)), thus truncating the protein near the C terminus. The 
VH mutant lacks the ability to form filaments but retains the ability to form 
pores. In accordance with its pore-forming activity, the RegIIIa VH mutant 
retained membrane toxicity against liposomes and live bacteria. In fact, 
membrane toxicity was modestly enhanced in the RegIIIa VH mutant, 
suggesting that trapping of the pore complexes in filaments inhibits their 
membrane permeabilizing activity. This function contrasts with that of human 


Regllla (wt) 
Regllla (VH) 


a-defensin-6 filaments, which directly trap bacteria in ‘nanonets”””. 

a, 1.0 uM wild-type (wt) and ReglIIIa (VH) mutant was added to 10 WM 
carboxyfluorescein-loaded liposomes and dye release was monitored. The 
detergent octylglucoside (OG) was added at the end of the experiment to 
disrupt remaining liposomes. b, Initial rate of liposome dye release (10 uM 
lipid) as a function of wild-type and mutant RegIIIx concentration. c, 5.0 WM 
wild-type or RegIIIa (VH) mutant was assayed for membrane disruptive 
activity towards whole bacteria using the SYTOX uptake assay described 


in Fig. 1. Assays were performed in triplicate. Error bars indicate s.d.; 
***D < 0.001. 
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Extended Data Figure 6 | CryoEM reconstruction of the RegIIIa filament 
structure. a, Raw image of a single filament. b, c, Comparison of the average 
power spectrum of cryoEM images of individual short helical segments (b) and 
the average power spectrum (c) from the projections of the three-dimensional 
reconstruction at evenly sampled rotation angles around the helical axis. Layer 
lines 1, 5 and 9 were labelled, and layer line 4 was clearly visible. d, Symmetry 
variability (A@ and Az) in the cryoEM data set. The reconstruction from the 
aligned images was imposed with symmetry parameters that vary around the 
centre pair (Ap = 54.5° and Az = 18.4 A), and the experimental data set was 
classified into nine bins by projection matching. The populations in these 
classes were exhibited in a three-dimensional histoplot. Even though the central 
bin is the most populated, the distribution is approximately flat. e, Fourier 
shell correlation (FSC) calculated from the two independent volumes but 
windowed in different boxes. The strong symmetry in the two volumes led 

to the FSC ~0.2 at the Nyquist frequency. The first fast drop of FSC curve to 
0.5 was elected to give an approximate estimate of resolution. f, Number of 
the filament images aligned with each reference projection from the three- 
dimensional model in the last round of refinement. The projections from the 
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three-dimensional model evenly sampled the orientation space. As expected, 
the distribution is fairly flat. g-j, Statistical analyses of the RegI IIo filament 
structure. g, First four eigenimages from the multivariate statistical analysis of 
the centred filaments in the data set that were padded to 320 pixels in size. The 
second and third images lack mirror symmetry around the central line, 
suggesting the parity is odd. The fourth image shows the significant local 
bending of the filaments, a major limiting factor for us in reaching a better 
resolution in our reconstruction. h, A good class average after the multivariate 
statistical analysis and hierarchical classification. i, Square root of calculated 
power spectrum of the class average in h. The tip of the red arrowhead points at 
10.4A. j, The layer lines in the average power spectrum of the rotational 
projections from the final reconstruction without symmetry imposition extend 
isotropically to ~9.2 A (yellow circle), and further along the vertical direction 
(helical axis). k, 1, Docking of the RegIII« crystal structure into the cryoEM 
map. k, The three-dimensional reconstruction calculated from the images in 
the central bin, d, with a hexameric pore highlighted. 1, Stereo image showing 
docking of the ReglII« crystal structure in the cryoEM density map of one 
subunit out of the reconstruction. 
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Cell Dimensions 
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Redundancy 6.5 (4.3) 
Refinement 
Resolution (A) 26.13 - 1.47 (1.52 - 1.47) 
No. reflections 22074 
Ryvord Bree 18.5/21.0 
No. atoms 
Protein 1901 
Ligand/ion 3 
Water 181 
B-factors 
Protein 19.2 
Ligand/ion 23.1 
Water 30.0 


R.m.s. deviations _ 
Bond lengths (A) 1.029 
Bond angles (°) 0.006 


*Highest resolution shell is shown in parenthesis. 
Extended Data Figure 7 | Crystal structure of bactericidally active RegIIIa. structure showing areas of conformational flexibility. Red indicates greater 


a, Table showing data collection and refinement statistics for the active RegIIIa _ flexibility. 
crystal structure. b, Crystallographic B-factor map of the active RegIIIo 
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Extended Data Figure 8 | RegIIIa, mutagenesis. a, Mutagenesis of Lys 93 
(K93) with conservative amino acid substitutions (Arg (R) and His (H)) does 
not alter membrane toxicity of ReglII«. 5 [1M of wild-type, Lys93Arg mutant, or 
Lys93His mutant RegIIIa was added to 100 UM carboxyfluorescein-loaded 
liposomes and dye release was monitored. These mutants retain membrane 
toxicity, in contrast to Lys93 Ala (Fig. 3e), suggesting the importance of positive 
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charges at these sites. b, Filamentation of RegIIIx mutants (Lys93Ala (K93A) 
and Glu114Gln (E114Q)) correlates with membrane toxicity. 20 1M RegIIIa 
Lys93Ala (left panel) or Glul 14GIn (right panel) was incubated for 20 min with 
unilamellar vesicles and visualized by negative-staining EM. The results 
demonstrate that the non-toxic Glu114Gln mutant, unlike the toxic Lys93Ala 
mutant, assembles into filaments. 
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Extended Data Figure 9 | Computational modelling of RegIIIa insertion 
into membranes. a, Top-down view of the numeric grid and complex 
boundary used in the elasticity calculations to represent the upper leaflet. The 
protein complex occupies the white space in the centre, and the membrane- 
protein contact curve is the red—white boundary. The membrane is modelled in 
all non-white regions. The rectangular grid for the elasticity solver is shown 
here coloured by the membrane bending energy density (red is high bending 
energy and blue is low bending energy). This calculation corresponds to the 
membrane bending shown in Fig. 3g. b-d, Numeric convergence of the model. 
b, Convergence of the elastostatic energy. In all panels, per cent error was 
calculated as 100| (E(n) - E(tymax))/E(Mmax) |» Where E(n) is energy calculated 
with n grid points, and nyax is maximum number of grid points used. The 
elastic energy converges smoothly as n increases, and we used n = 400 in both 
the x and y directions for all calculations in the main text, which gives a 5% 
error. c, Convergence of the electrostatic energy. Per cent error of the dipole 
charge-protein interaction energy (diamonds), protein solvation energy 
(squares), anionic lipid charge-protein interaction energy (circles) and the total 


electrostatic energy (triangles) are shown as a function of the grid discretization. 
A value of n = 161 was used for the calculations discussed in the main text 
resulting in a total electrostatic error of 2.5%. d, Convergence of the non-polar 
energy. A discretization of n = 100 points was used for the calculations 
reported in the main text, and this has a very small error on the order of 0.1%. 
Values used for calculations in the main text are indicated by an asterisk. 

e, f, Electrostatic potential of the RegIIIox pore complex. e, In-plane view. The 
Poisson—Boltzmann equation was solved using APBS after embedding the 
complex in a low dielectric region mimicking the lipid bilayer”’. The low 
dielectric membrane region is deformed corresponding with the lowest energy 
shape predicted by our physics-based computational model. Positive (blue) 
isocontours of the electrostatic potential are drawn at +5 kcalmol'e™'. 

f, Out-of-plane view. All details are identical to those in panel a. Both positive 
(blue) and negative (red) isocontours of the electrostatic potential are drawn 
at +5kcalmol”'e |. g, Table showing bilayer material properties used in the 
modelling calculations. h, Table showing model parameters. References 24-29 
are cited in this figure. 
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Extended Data Figure 10 | Modelling of RegIIIa-membrane interactions. 
a, RegIIIa pore complex model shown from the side. Arg 166 (R166) is 
located near the water-membrane interface, indicating that it is positioned to 
interact with the phospholipid head-groups, whereas Arg 39 is predicted to 
be exposed to aqueous solvent. Membrane boundaries predicted from the 
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are indicated. b, 5 1M of wild-type, Arg166Ala 


mutant, or Arg39Ala mutant RegI IIa was added to 100 uM carboxyfluorescein- 
loaded liposomes and dye release was monitored. The experimental results are 


consistent with the position 
in the model. 
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Structural basis for recognition of synaptic vesicle 
protein 2C by botulinum neurotoxin A 


Roger M. Benoit', Daniel Frey'*, Manuel Hilbert!*, Josta T. Kevenaar*, Mara M. Wieser’, Christian U. Stirnimann’, 
David McMillan*, Tom Ceska‘, Florence Lebon®, Rolf Jaussi', Michel O. Steinmetz’, Gebhard F. X. Schertler!®, 
Casper C. Hoogenraad”, Guido Capitani! & Richard A. Kammerer! 


Botulinum neurotoxin A (BoNT/A) belongs to the most dangerous 
class of bioweapons'. Despite this, BoNT/A is used to treat a wide 
range of common medical conditions such as migraines and a variety 
of ocular motility and movement disorders’. BoNT/A is probably 
best known for its use as an antiwrinkle agent in cosmetic applica- 
tions (including Botox and Dysport)’. BoNT/A application causes 
long-lasting flaccid paralysis of muscles through inhibiting the release 
of the neurotransmitter acetylcholine by cleaving synaptosomal- 
associated protein 25 (SNAP-25) within presynaptic nerve terminals*. 
Two types of BoNT/A receptor have been identified, both of which 
are required for BoNT/A toxicity and are therefore likely to coope- 
rate with each other’: gangliosides and members of the synaptic 
vesicle glycoprotein 2 (SV2) family, which are putative transporter 
proteins that are predicted to have 12 transmembrane domains, 
associate with the receptor-binding domain of the toxin’. Recently, 
fibroblast growth factor receptor 3 (FGFR3) has also been reported 
to be a potential BoNT/A receptor®. In SV2 proteins, the BoNT/A- 
binding site has been mapped to the luminal domain’, but the mole- 
cular details of the interaction between BoNT/A and SV2 are unknown. 
Here we determined the high-resolution crystal structure of the 
BoNT/A receptor-binding domain (BoNT/A-RBD) in complex with 
the SV2C luminal domain (SV2C-LD). SV2C-LD consists of a right- 
handed, quadrilateral B-helix that associates with BoNT/A-RBD 
mainly through backbone-to-backbone interactions at open B-strand 
edges, in a manner that resembles the inter-strand interactions in 
amyloid structures. Competition experiments identified a peptide 
that inhibits the formation of the complex. Our findings provide a 
strong platform for the development of novel antitoxin agents and 
for the rational design of BoNT/A variants with improved thera- 
peutic properties. 

The overall structure of BONT/A-RBD in complex with SV2C-LD is 
shown in Fig. 1, and the data collection and refinement statistics are 
listed in Extended Data Table 1. The convex interface formed by the 
amino- and carboxy-terminal BoNT/A-RB subdomains interacts with 
SV2C-LD. SV2C-LD forms a right-handed, quadrilateral B-helix, a 
fold that is characteristic of pentapeptide-repeat proteins® (Extended 
Data Fig. 1). Based on a search of the Dali server’, the proteins that 
are most structurally similar to SV2C-LD in the Protein Data Bank 
(PDB)'° are MfpA, a pentapeptide-repeat protein from Mycobacterium 
tuberculosis’, and AlbG, an McbG-like protein from Xanthomonas 
albilineans”’. 

It has been noted that the molecular architecture of B-helices is 
similar to the structure of amyloid fibrils’*. Like in amyloid fibrils, the 
regular B-sheet edges of B-helices are in a conformation that provides 
the possibility of interacting with other B-strands and aggregating into 
higher-order structures. To avoid the formation of such aggregates, the 
ends of B-helices are typically capped by other secondary-structure 


elements'*’. No additional secondary-structure elements are seen 
on the N-terminal side of SV2C-LD, but the first N-terminal turn 
of the SV2C-LD f-helix contains more charged amino acids in the 
inward facing positions than do the other coils. Such strategically 
placed charges are known to prevent aggregation of single-sheet proteins 
and f-propellers’®. At its C terminus, SV2C-LD contains a short 310- 
helix (Extended Data Fig. 1), but this helix precedes the two B-strands 
that interact with BoNT/A-RBD. The interaction between the open 
B-strand of SV2C-LD and the B-strand edge of BoONT/A-RBD is domi- 
nated by backbone-backbone hydrogen bonds, resembling the inter- 
strand interactions in amyloid structures'*’’. Our findings therefore 
indicate that Clostridium botulinum has exploited a weakness in the 
structure of SV2 proteins to transport BoNT/A into host cells. 


Figure 1 | The BONT/A-RBD-SV2C-LD complex. a, Cartoon representation 
of the structure of BoONT/A-RBD (green) in complex with SV2C-LD (blue). 
Dotted lines indicate flexible regions of SV2C-LD that were not visible in the 
structure. The terminal residues of the SV2C-LD expression construct are 
indicated. b, Close-up view of the interaction site (boxed area from a). 
Backbone-backbone and backbone-side chain hydrogen bonds are indicated 
by dashed orange lines. Side chain-side chain interactions are shown as dashed 
purple lines. c, The complex structure from a different view. d, Close-up of 
the interaction site (boxed region from c). b, d, Carbon is shown in light blue for 
SV2C-LD and green for BoNT/A-RBD. In both structures, dark blue denotes 
nitrogen, red denotes oxygen and yellow denotes sulphur. 
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BoNT/A has been shown to bind to neurons not only through SV2 
proteins’, or potentially FGFR3 (ref. 6), but also in conjunction with 
gangliosides as co-receptors’®. A superimposition of our BoONT/A-RBD- 
SV2C-LD complex crystal structure with the previously reported full- 
length BoNT/A structure” and the BoNT/A-RBD-GT1b structure™® 
shows that the binding site for gangliosides is located in the C-termi- 
nal BoNT/A-RB subdomain almost opposite to the SV2C-binding site 
(Fig. 2). The two binding regions are separated by a distance of approxi- 
mately 40 A, illustrating that both receptors can simultaneously interact 
with the toxin. This finding might have important functional implica- 
tions for the translocation of BoNT/A across the synaptic vesicle mem- 
brane and for subsequent cell intoxication, which are mechanisms of 
major importance that are still poorly understood. Recent studies using 
BoNT/B and BoNT/E have demonstrated that the low pH in the endo- 
somal lumen and binding to the ganglioside GT1b are both required 
for the toxins to transform into oligomeric, hydrophobic membrane- 
associated channels*””’. Although the mechanism of channel formation 
has not yet been studied at such detail for BoNT/A, additional binding 
to SV2 proteins might be a prerequisite for toxin translocation. Alter- 
natively, binding to SV2 proteins might prevent the translocation of 
BoNT/A by hindering the formation of oligomeric transmembrane 
channels. Such inhibition might be relieved by acidification in the endo- 
somal lumen and result both in the dissociation of the SV2-BoNT/A 
complex and the known inhibition of BoNT/A translocation by the 
RBD at neutral pH”. One promising candidate for a pH-sensing resi- 
due in SV2C-LD is H564, which is present at the interface of the complex. 
Consistent with this hypothesis, we observed an approximately five- 
fold decrease in the dissociation constant (K) of the complex between 


Cytoplasm 


Figure 2 | Proposed model for simultaneous ganglioside and SV2C binding 
by BoNT/A. Superimposition of full-length BoNT/A”, the BoNT/A-RBD- 
GT1b ganglioside complex'* and the BoNT/A-RBD-SV2C-LD complex. The 
ganglioside (white and atom colours; labelled in the figure) and SV2C-LD 
(blue) bind to opposite sides of the C-terminal BoNT/A-RB subdomain 
(dark green). The N-terminal BoNT/A-RB subdomain (pale green), the 
translocation domain (including the belt (red)) and the protease domain 
(yellow) point away from the binding site. The SV2C transmembrane portion 
(light grey) was modelled. The position of the plasma membrane is 
proposed. Linker regions are shown as black dotted lines. Linker boundaries 
are indicated. 
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pH 7.5 and pH 5 in fluorescence anisotropy experiments (Extended 
Data Fig. 2). 

Comparison of BoNT/A-RBD structures in the free state and SV2C- 
LD-bound state reveals no significant conformational changes in the 
SV2C-LD-interacting elements of BONT/A-RBD on complex forma- 
tion. Cx superimposition of the BoNT/A-RBD” onto the BoNT/A- 
RBD-SV2C-LD complex results in a root mean squared deviation 
(r.m.s.d.) of 0.54 A over 412 residues. At the interface, 15 SV2C-LD 
residues and 19 BoNT/A-RBD residues are partially or fully buried, 
resulting in a contact area of approximately 596 A”. Several of these 
buried residues are engaged in backbone-backbone hydrogen bonds 
between interacting B-strands (Extended Data Table 2). One prom- 
inent feature seen in the structure is the accumulation of positively 
charged surface residues at the BoNT/A-RBD interface, whereas the 
binding surface of SV2C-LD is slightly negatively charged (Fig. 3). The 
residues are not engaged in inter-domain salt bridges; therefore, rela- 
tively few specific side-chain interactions are observed in the complex 
structure. Among these, the interaction between R1156 of BoNT/A- 
RBD and F563 of SV2C-LD is of particular interest, as it is a cation-1- 
stacking interaction. The hydrogen bond between SV2C-LD N559 and 
BoNT/A-RBD Y1149 extends the interaction to the more distant 
B-strand of the toxin. Hydrogen bonds between SV2C-LD N559 and 
BoNT/A-RBD T1145, as well as between the backbone oxygen and 
nitrogen of SV2C-LD F557 and BoNT/A-RBD T1146, stabilize the 
complex downstream of the C-terminal boundary of the directly inter- 
acting toxin B-strand. R1294 is a residue that prominently contributes 
to the positive charge of the BoNT/A-RBD surface. Together, these 
residues seem to be important for shape and charge complementarity 
and consequently for binding specificity. 

The importance of these prominent side-chain interactions was 
tested by site-directed mutagenesis (Extended Data Fig. 3). Mutation 
of the two BoNT/A-RBD threonines T1145 and T1146 completely 
abrogated binding to SV2C-LD in pull-down assays, and mutation 
of R1294 or R1156 in BONT/A-RBD significantly reduced this binding 
(Extended Data Fig. 3). Likewise, substitution of the phenylalanine at 
position 563 of SV2C-LD with alanine strongly reduced the binding to 
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Figure 3 | Open book representation of the BoONT/A-RBD-SV2C-LD 
interaction site. The components of the complex are shown separately and 
coloured according to their electrostatic potential (+3 kT/e, where k is the 
Boltzmann constant, T is temperature and e is the elementary charge): blue 
denotes basic residues, red denotes acidic residues and white denotes 
hydrophobic residues. The yellow ellipse indicates the approximate location of 
the hydrogen-bonding residues. The orange ellipse indicates the positively 
charged R1294 in BoNT/A-RBD and the mostly negatively charged surface 
region in SV2C-LD. R1294 is not defined in the electron density of the complex 
structure and hence does not participate in hydrogen bonds or salt bridges. 
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Figure 4 | The BoNT/A-A2 peptide inhibits the internalization of BoNT/A- 
RBD by striatal neurons (at 18 days in vitro). a, Co-localization of 
endogenous SV2C with GFP-BoNT/A-RBD (top) or endogenous vesicular 
inhibitory amino acid transporter (VGAT) (bottom) in cultured striatal 
neurons. Scale bars, 50 um. Enlargements of the regions indicated by the white 
rectangles are shown in the panels on the right. Scale bar, 10 jum. 

b, Quantification of GFP-BoNT/A-RBD uptake by cultured striatal neurons. 
The number of co-localizing GFP-BoNT/A-RBD and VGAT puncta was 


BoNT/A-RBD, whereas mutation of the SV2C-LD residue N559 had 
no significant effect on interaction with the toxin. To further char- 
acterize the complex interaction, we performed fluorescence aniso- 
tropy experiments. Analysis of the wild-type proteins yielded a Kg 
value of 0.26 + 0.2 uM. Kg values obtained for the mutant proteins 
were consistent with the pull-down results (Extended Data Fig. 4 and 
Extended Data Table 3). 

It has been reported that SV2 peptides harbouring the toxin-binding 
site prevent BoNT/A from binding to neurons’. We therefore analysed 
the potency of short BoNT/A-RBD- and SV2C-LD-derived peptides at 
inhibiting complex formation. Based on the interactions seen in the 
crystal structure, the peptide sequences were designed to correspond to 
the two SV2C-LD f-strands (peptide SV2C-A3), the BoNT/A-RBD 
B-strand (peptide BoNT/A-A5) and the BoNT/A-RBD f-sheet (pep- 
tide BoNT/A-A2) that interact in the complex (Extended Data Fig. 3g). 
We observed full inhibition of complex formation in the presence of 
the peptide corresponding to the BoNT/A-RBD f-sheet and partial 
inhibition in the presence of the peptide corresponding to the two 
SV2C-LD B-strands, whereas no reduction in binding was seen in the 
presence of the peptide corresponding to the single BoNT/A-RBD 
B-strand. These results indicate that the three-dimensional conforma- 
tion of SV2C-LD is crucial for the interaction. Consistent with this 
conclusion, the shortest SV2C-LD fragment reported to bind to BoNT/ 
A spans residues 529-566 of the SV2C sequence’. The fragment 
extends over almost two turns of the B-helix and might therefore form 
a moderately stable three-dimensional structure. We next tested the 
BoNT/A-A2 f-sheet peptide for inhibition of BoONT/A-RBD binding 
to and internalization by cultured striatal neurons (at 18 days in vitro) 
(Fig. 4 and Extended Data Fig. 5), which endogenously express SV2C, 
or HEK-293 cells that were transfected with Flag-tagged SV2C protein 
(Extended Data Fig. 6). In the absence of the toxin-derived peptide, 
BoNT/A-RBD co-localized with SV2C. Pre-incubation of the cells with 
5 uM glutathione S-transferase (GST)-tagged BoNT/A-RBD peptide 
resulted in a significant decrease in toxin binding and internalization, 
whereas no inhibition of the complex interaction was observed in the 
presence of a GST-only control. 

The findings of our study have important implications for the develop- 
ment of novel antitoxin agents, as well as for the medical and cosmetic 
applications of BoNT/A. The toxin has been described as one of the six 
most dangerous bioweapons’. Currently, the only available antidotes 
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normalized to the total number of VGAT puncta. No GFP-BoNT/A-RBD, 
0.01% + 0.00%; GFP, 0.03% + 0.01%; no high K* buffer, 0.57% + 0.18%; no 
glutathione S-transferase (GST)-BoNT/A-A2 (GST-A2), 8.53% + 0.09%; 
GST-BoNT/A-A2 (GST-A2), 3.42% + 0.69%; GST, 6.84% + 1.12%. n = 20 
images per group from two cultures, Mann-Whitney U test; ***P < 0.001; 
NS, not significant. N = 2 independent experiments. All tests were performed 
two-sided. Data are presented as mean + s.e.m. 


for BoNT/A have severe side effects. Proposed alternative approaches 
include peptide inhibitors, and all three domains of the BoNTs have 
been considered as potential drug targets**. The detailed structural 
information reported on the SV2C-LD-BoNT/A-RBD interaction 
should therefore be of particular interest for the targeted development 
of BoNT/A-specific antibodies or high-affinity peptide inhibitors 
directed against the SV2-binding interface. 

Depending on the type and area of medical or cosmetic application, 
the recommended dose of BoNT/A varies markedly”, carrying a sub- 
stantial risk of accidental BoNT/A overdose and poisoning. The crystal 
structure of the BoNT/A-RBD in complex with SV2C-LD provides a 
strong structural basis for the rational design of BoNT/A variants with 
an attenuated capacity to bind to SV2 proteins. Such variants are pro- 
mising candidate proteins for broadening the very narrow therapeutic 
window, possibly resulting in much safer applications of the toxin. 


METHODS SUMMARY 


All of the methods are described in detail in the Methods. The proteins were cloned 
into pET-based vectors and expressed in bacteria using auto-induction. Protein 
purification was performed using immobilized metal affinity chromatography 
(IMAC) or GST sepharose affinity chromatography followed by size-exclusion 
chromatography. The purified BoNT/A and SV2C domains were combined in a 
1:1 ratio. The complex was purified in an additional size-exclusion chromato- 
graphy step, concentrated to 8mgml ' and crystallized by vapour diffusion at 
20°C. The reservoir solution consisted of 100 mM HEPES, pH 7.5, 6% PEG 8000, 
8% glycerol and 100mM NaCl. For cryoprotection, the reservoir solution was 
supplemented with an additional 10% glycerol. Diffraction data were collected 
at the Swiss Light Source (SLS). The crystals belong to space group C 1 2 1 and are 
pseudomerohedrally twinned with twin law —h, —k, |. The structure was solved by 
molecular replacement using PDB entry 3FUO as a search model. Pull-down 
assays were performed using IMAC with 6XHis-tagged BoNT/A-RBD and 
untagged SV2C-LD. Kg determination was performed by fluorescence anisotropy. 
For functional assays, primary striatal cultures were prepared from embryonic day 
18 rat brains, as described for hippocampal neurons”. At 18 days in vitro, cultured 
striatal neurons were stimulated using a high K* buffer (70 mM KCI, 51.5mM 
NaCl, 25mM HEPES, 30mM glucose, 2mM CaCl, and 2mM MgCl; pH 7.4) 
with or without the GST-BoNT/A-A2 peptide or a GST control, or neurons were 
treated with a control buffer (2.5 mM KCI, 119 mM NaCl, 25 mM HEPES, 30 mM 
glucose, 2mM CaCl, and 2 mM MgCl; pH 7.4), for 15 min at 37 °C. The neurons 
were incubated with GFP-BoNT/A-RBD or GFP only for 10 min at 37 °C and then 
washed with high K* or control buffer and then fixed. HEK293T cells were 
cultured in DMEM/Ham’s F-10 (50/50) medium supplemented with 10% FCS 
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and 1% penicillin/streptomycin at 37 °C in 5% CO, and were transfected with the 
plasmid pGW2-mRFP (monomeric red fluorescent protein) and SV2C-Flag, or 
pGW2-mRFP only. After 24h, the cells were incubated with fresh culture medium 
containing GST-BoNT/A-A2 or a GST control for 15 min at 37 °C. The cells were 
then incubated with BoNT/A-RBD for 10 min at 37 °C, washed with fresh culture 
medium and fixed. Immunohistochemistry, confocal imaging, data representation 
and statistical analyses were performed as previously described”. Image analysis 
was performed using Image] software. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 

Cloning and protein production. The cDNAs encoding human SV2C (Swiss- 
Prot Q496J9) amino acids 456-574 and BoNT/A (Swiss-Prot P10845) amino acids 
871-1296 were amplified by PCR using DNA templates that were codon-optimized 
for expression in Escherichia coli (GENEWIZ). The primer sequences used were: 
BoNT/A-RBD, 5’-GTGCCGCGCGGATCCAAAAACATCATCAACACGTCC 
ATTCTCAACCTC-3’ and 5’- TTGCTAAGTGAGCTCTCATTACAGCGGAC 
GTTCACCCCAACC-3'; and SV2C-LD, 5'-GTGCCGCGCGGATCCGACGTA 
ATCAAGCCGCTCCAGTC-3’ and 5'-TTGCTAAGTGAGCTCTCATTAGTCA 
AACGTGATTTGGCAACCCG-3’. The BoNT/A-A2 (B-sheet) peptide insert encod- 
ing residues 1140-1153 was generated by using synthetic primers (5’-CCACGA 
ACATATATCTGAATTCCAGCTGATAATTGACAGAGCTCACTTAGCAAG 
ATATAATACAAATCCGCCGAGC-3’ and 5'-CAGATATATGTTCGTGGTC 
ATTACACTACCGCGACCGGATCCGGGTCCCTGGAACAGAACTTCCAGA 
TCCGATTTTG-3’). Cloning into in-house assembled, pET-based vectors con- 
taining an N-terminal 6XHis, 6<His-GFP or GST tag was performed by co- 
transformation cloning”*. Mutagenesis was performed as described previously”. 
All insert sequences were verified by DNA sequencing (by GATC). 

The proteins were expressed separately in E. coli strain NiCo21(DE3) (New 
England Biolabs). Precultures (100 ml Luria Bertani (LB) medium supplemented 
with 50 ig ml’ kanamycin) were grown overnight at 37 °C in a shaking incub- 
ator. ZYM-5052 medium for auto-induction® (41 supplemented with 50 pg ml? 
kanamycin) was inoculated 1:40 with the overnight culture, and the bacteria were 
grown for 7h at 37 °C. Subsequently, the temperature was lowered to 20°C, and 
the incubation was continued overnight. The GST control was expressed in LB 
medium supplemented with 100 1g ml’ ampicillin, using induction with 1mM 
isopropylthiogalactoside (IPTG). 

The cells were harvested by centrifugation. The cell pellets were resuspended in 
100 ml lysis buffer (50 mM Tris, pH 7.5, 500 mM NaCl, 10 mM imidazole, 10 mM 
B-mercapthoethanol and 1 cCOmplete EDTA-free protease inhibitor cocktail tablet 
(Roche Diagnostics)). The cells were lysed on ice by ultrasonication. Lysate clear- 
ing was performed for 1h at 25,000g, and the resultant supernatant was filtered 
(0.45 tm filter). The proteins were subsequently purified by IMAC (on a 5ml 
HisTrap FF Crude column, GE Healthcare) and gel filtration (HiLoad 16/60 
Superdex 200, GE Healthcare). The GST control was purified by GST sepharose 
affinity chromatography (GE Healthcare) followed by size-exclusion chromato- 
graphy. SV2C-LD and BoNT/A-RBD were combined in a 1:1 ratio, co-purified by 
size-exclusion chromatography and concentrated to 8 mg ml ' for crystallization. 
Crystallization and structure elucidation. The BoONT/A-RBD-SV2C-LD com- 
plex was crystallized in 100 mM HEPES, pH 7.5, 6% PEG 8000, 8% glycerol and 
100mM NaCl using a grid screen around the conditions described for the crys- 
tallization of BONT/A-RBD alone”. For cryoprotection, the reservoir solution was 
supplemented with an additional 10% glycerol. A complete 2.3 A resolution data 
set was collected from a single crystal at 100 K at the XO6DA beamline of the Swiss 
Light Source using a wavelength of 1.000 A. The high-resolution limit was chosen 
according to the guidelines recently suggested by Karplus and Diederichs*’. Data 
processing was carried out with XDS*. The crystals belong to space group C121 
and are pseudomerohedrally twinned with twin law —h, —k, I. The structure of the 
BoNT/A-RBD-SV2C-LD complex was solved by molecular replacement using 
residues 875-1295 of BoONT/A-RBD from PDB entry 3FUO as a search model”. 
Two BoNT/A-RBD molecules could be positioned in the C 1 2 1 asymmetric unit 
by using Phaser*’. The initial atomic model was refined with phenix.refine™, 
taking into account the twin law. From the first refinement cycles, additional 
density corresponding to SV2C-LD became visible. This allowed manual building 
of two copies of the domain, each one bound to a copy of BoNT/A-RBD in a 1:1 
complex. Coot** was used for model building. The final refined structure exhibits 
good geometry and stereochemistry, as validated with MolProbity”® (residues in 
favoured regions 90.17% and outliers 0.81%). Structure figures were prepared 
using PYMOL”. Electrostatic potentials were calculated using APBS**. Secondary- 
structure matching superimpositions” were performed using Coot**. PROMOTIF” 
was used for structure analysis. A homology model of the transmembrane part of 
SV2C was generated using Phyre*’. A PDB file of a lipid bilayer slice by Heller 
et al.” was used for the generation of Fig. 2. Hydrogen bonds at the complex 
interface were analysed using PDBePISA* v1.37. 

Pull-down assays. Purified 6X His-tagged BoNT/A-RBD and untagged SV2C-LD 
were centrifuged separately at 16,000g for 15 min at 4 °C. The proteins were then 
combined and incubated overnight at 4°C in pull-down buffer (20mM Tris, 
pH7.8, 150mM NaCl and 5 mM f-mercaptoethanol). The interactions of wild- 
type BoNT/A-RBD and mutant BoNT/A-RBD with wild-type SV2C-LD were 
analysed at 5 1M and 40 uM, respectively. The binding of wild-type SV2C-LD 
and mutant SV2C-LD to wild-type BoNT/A-RBD was analysed at 10 1M and 5 1M, 
respectively. After incubation, the samples were centrifuged, and the supernatants 
were incubated with ~50 pl Ni Sepharose 6 Fast Flow resin (GE Healthcare) for 


1h. The Ni Sepharose resin was washed four times with 2 ml ice-cold pull-down 
buffer. Proteins were eluted in 100.1 pull-down buffer supplemented with 
350 mM imidazole. The eluted solutions were analysed by SDS-PAGE. 

Peptide competition experiments were performed in the same way except 
that 50] 10mM peptide suspension in pull-down buffer was mixed with the 
partner protein domain and incubated on ice for 5 min before adding the other 
protein domain. Final concentrations were 5 1M BoNT/A-RBD, 20 1M SV2C-LD 
and 2mM peptide. Peptides were ordered from Charité. The peptide sequences 
were as follows: SV2C-A3, NH,-DSEFKNCSFFHNK-COOH; BoNT/A-A2 (B- 
sheet), NH2-RGSVMTTNIYLNSS-COOH; and BoNT/A-A5 (f-strand), NH>- 
RGSVMTTN-COOH. 

Protein labelling. The accessible cysteine residues of SV2C-LD were covalently 
labelled overnight with fluorescein maleimide (Molecular Probes) in 50 mM Tris, 
pH7.5, 150 mM NaCl and 1 mM Tris(2-carboxyethyl)phosphine, using a 20-fold 
excess of dye. Successful labelling and removal of free fluorescein by desalting 
(illustra NAP-5 columns, GE Healthcare) was confirmed by SDS-PAGE. The 
protein:fluorescein ratio was determined from the absorption at 280 nm and 493 nm. 
The absorption at 280 nm was corrected for the absorption of fluorescein to cal- 
culate the protein concentration. The labelling efficiency was approximately 50%. 
Kg determination. The binding affinities of SV2C-LD to BoNT/A-RBD (wild- 
type and mutants) were determined in 50 mM Tris, pH 7.5, and 150 mM NaCl by 
fluorescence anisotropy titration using 100nM fluorescein-labelled SV2C-LD 
(SV2C-LD-F). The anisotropy signal was measured in a Cary Eclipse Fluorescence 
Spectrophotometer (Agilent Technologies) equipped with automated polarizers 
(Ae. 480 nm, Aen 520 nm, slit 10 nm, 20 °C, 5 s signal acquisition, g = 1.4495). Each 
data point was averaged from five measurements. To account for the reduced 
quantum efficiency of fluorescein at pH 5 (20mM citrate and 150mM NaCl), 
the slits were opened to 20 nm. The K, values were determined by fitting the data 
to a one-site-binding model using Origin 7 (OriginLab). The binding affinity of 
unlabelled SV2C-LD and its mutants was determined by displacing SV2C-LD-F 
from the complex. The coupled equations for competitive binding were numer- 
ically fitted with DataFitter Software (D. Veprintsev, unpublished). The affinities of 
SV2C-LD mutants were independently confirmed by determining the affinity 
from the apparent K, of the BoNT/A-RBD-SV2C-LD-F complex in the presence 
of unlabelled SV2C-LD™. 

Animals and cells. All animal experiments were performed in compliance with the 
guidelines for the welfare of experimental animals issued by the federal government 
of The Netherlands. All animal experiments were approved by the Animal Ethical 
Review Committee (DEC) of Utrecht University. HEK293T cells were obtained 
from the American Type Culture Collection and tested for mycoplasma. 

DNA constructs and antibodies for functional studies. The SV2C-Flag con- 
struct was generated by PCR using the human full-length cDNA (Origene) as a 
template. The insert was cloned into a GW1-CMV vector, and the sequence was 
verified. The pGW2-mREFP construct was generated as described*. The following 
primary antibodies were used: goat anti-SV2C (1:400, Santa Cruz Biotechnology, 
clone P-20, order no. sc-11946), rabbit anti-VGAT (1:400, Synaptic Systems, order 
no. 131003), rabbit anti-Flag, (1:1,000, Sigma, order no. F7425), mouse anti-His 
(1:400, NeuroMab, clone N144/14, order no. 75-169) and mouse anti-tubulin-B3 
(TUBB3, 1:400, Sigma, order no. PRB-435P). The following secondary antibodies 
were used: Alexa Fluor 488-conjugated goat anti-mouse antibodies (order no. A11029); 
Alexa Fluor 568-conjugated goat anti-rabbit antibodies (order no. A11036); Alexa 
Fluor 647-conjugated goat anti-rabbit antibodies (order no. A21245) and Alexa 
Fluor 647-conjugated goat anti-mouse antibodies (order no. A21236); Alexa Fluor 
488 (order no. A11055)- and Cy5 (order no. 705-175-147)-conjugated donkey 
anti-goat antibodies; and Cy3-conjugated donkey anti-rabbit antibodies (order 
no. 711-165-152). Alexa Fluor-conjugated antibodies were purchased from 
Invitrogen, and cyanine-dye-conjugated antibodies were purchased from Jackson 
ImmunoResearch. All secondary antibodies were used 1:400. 

Blocking of BONT/A-RBD uptake by neurons. Primary striatal cultures were 
prepared from embryonic day 18 rat brains, as described for hippocampal neurons”*. 
Cells were plated on coverslips coated with 35 pg ml! poly-L-lysine (Sigma) and 
5 wg ml‘ laminin (Roche) at a density of 100,000 cells per well. Striatal cultures 
were grown in neurobasal medium (NB, Invitrogen) supplemented with B27 
(Invitrogen), 0.5mM glutamine (Invitrogen), 12.5 .M glutamate (Sigma) and 
penicillin/streptomycin mix (Invitrogen) at 37 °C in 5% COb. 

At 18 days in vitro (DIV), cultured striatal neurons were stimulated using a high 
K* buffer (70mM KCl, 51.5mM NaCl, 25mM HEPES, 30mM glucose, 2mM 
CaCl, and 2mM MgCl; pH7.4) with or without the GST-BoNT/A-A2 peptide 
(5 uM) or the GST control (5 11M), or neurons were treated with a control buffer 
(2.5mM KCl, 119mM NaCl, 25mM HEPES, 30mM glucose, 2mM CaCl, and 
2mM MgCl; pH 7.4), for 15 min at 37 °C. The neurons were incubated with GFP- 
BoNT/A-RBD (200 nM) or GFP only (200 nM) for 10 min at 37 °C. The cells were 
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then washed with high K* or control buffer solution and fixed with 4% para- 
formaldehyde/4% sucrose in PBS for 10 min. 

Blocking of BoNT/A-RBD binding in HEK293T cells. HEK293T cells were 
cultured in DMEM/Ham’s F-10 (50/50) medium supplemented with 10% FCS 
and 1% penicillin/streptomycin at 37°C in 5% CO, and were transfected with 
pGW2-mRFP and SV2C-Flag, or pGW2-mRFP only, using the FuGENE 6 
reagent (Roche). After 24h, cells were incubated with fresh culture medium con- 
taining GST-BoNT/A-A2 or the GST control for 15 min at 37 °C. Next, cells were 
incubated with BoNT/A-RBD (100 nM) for 10 min at 37 °C, washed with fresh 
culture medium and fixed with 4% paraformaldehyde/4% sucrose in PBS for 
10 min. 

Immunohistochemistry, confocal imaging and image analysis. After fixation, 
cells were washed three times with PBS for 5 min and incubated with the indicated 
primary antibodies in GDB buffer (0.2% BSA, 0.8 M NaCl, 0.5% Triton X-100 and 
30 mM phosphate buffer; pH 7.4) overnight at 4°C. Neurons were then washed 
three times for 5 min with PBS, incubated with secondary antibodies in GDB 
buffer for 1 h at room temperature and washed again three times for 5 min with 
PBS. Coverslips were mounted using VECTASHIELD mounting medium (Vector 
Laboratories). 

Confocal images were acquired using an LSM 700 confocal laser-scanning 
microscope (Zeiss) using a 40X (1.3 NA) oil objective. Each image was acquired 
from a z-series of 5 images, each averaged twice, and was chosen to cover the entire 
region of interest from top to bottom using a 0.47-11m interval. The resultant 
z-stack was ‘flattened’ into a single image using maximum intensity projection. 
All image acquisition settings were kept constant throughout the experiments. 

Image analysis was performed using ImageJ software (National Institutes of 
Health). For analysis of the amount of synaptic GFP-BoNT/A-RBD uptake by 
cultured striatal neurons, the ImageJ plug-in ‘Puncta Analyzer’ was used to auto- 
matically count the number of VGAT- and GFP-BoNT/A-RBD positive puncta. 
Threshold values were held constant throughout the experiments. To exclude 
nonspecific background signal, a filter excluding puncta smaller than 0.6 jum? 
and larger than 6.3 1m” was applied. The number of co-localizing VGAT and 
GFP-BoNT/A-RBD puncta was normalized to the total number of VGAT-positive 
puncta to correct for changes in neuron density per image. For analysis of the 
amount of BoNT/A-RBD binding in HEK293T cells, the total area of BoNT/A- 
RBD was normalized to the total area of mRFP-transfected HEK293T cells above 
an assigned threshold. Threshold values were kept constant throughout the experi- 
ments. Thresholded images were processed using a median filter to remove noise. 
Data representation and statistics. The number of cells analysed was based on 
previous data sets’’. For all experiments, N (number of replicates) represents bio- 
logical replicates. In all bar graphs, the data are presented as the mean values + s.e.m. 
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Statistical analysis was performed using SPSS Statistics v20 (IBM). The non-parametric 
Mann-Whitney U test was used to test for statistical significance. P values below 
0.05 are considered significant (*P< 0.05; **P<0.01; ***P <0.001; NS, not 
significant). All tests were performed two-sided. The data samples were tested 
for normal distribution with the Kolmogorov-Smirnov test and for heterogeneity 
of variance with Bartlett’s test. 
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Extended Data Figure 1 | Structure of the SV2C luminal domain. Side view _ yellow and green, respectively. The side chains are shown as lines in green and 
(a) and top view (b) of SV2C-LD chain D of the complex structure. One full in atom colours. The 3, -helix is indicated. The flexible N- and C-terminal 
turn of the B-helix comprises 20 amino acids. The central hydrophobic core of __ regions that were not visible in the structure are schematically indicated as 
the B-helix is mostly formed by stacked, slightly tilted phenylalanine residues. _ dotted lines. 

Cartoon representation: the helix, B-strands and loops are shown in red, 
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Extended Data Figure 2 | Binding of BONT/A-RBD to SV2C-LD is reduced — SV2C-LD at pH7.5 (green) and pH 5 (red) (b). The affinity of SV2C-LD for 
on acidification. Normalized fluorescence anisotropy titration of BoNT/A-RBD at pH 5 is reduced by a factor of ~5. For values, see Extended 
BoNT/A-RBD with labelled SV2C-LD (a) and displacement with unlabelled Data Table 3. 
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NH,- RGSVMTTNIYLNSS-COOH NH,-RGSVMTTN-COOH NH,-DSEFKNCSFFHNK-COOH 
\ 
—_ ——_— 
Extended Data Figure 3 | Interaction between SV2C-LD and defined in the electron density of the complex structure and hence does not 


BoNT/A-RBD. a, Overview of the prominent interactions that were analysed _ participate in hydrogen bonds or salt bridges. Nevertheless, mutagenesis of 
by site-directed mutagenesis. The colour code in Fig. 1 is used. b, SDS-PAGE __R1294 to alanine strongly reduces the binding of BoNT/A-RBD to SV2C-LD. 


analysis of the pull-down assays. The 6X His-tagged BoNT/A domain We speculate that long-range electrostatic interactions between the positively 
(~50 kDa) and the untagged SV2C domain (~ 15 kDa) are indicated by charged BoNT/A-RBD arginine (depicted as a surface coloured according to 

arrows. c-f, Close-up views of specific interactions. c, The hydrogen bonds of __ the electrostatic potential) and the negatively charged regions in SV2C-LD have 
N559. d, The hydrogen bonds of T1145/T1146. e, The cation-n stacking a role in complex formation. g, Sequences and schematic representations of the 


interaction between BoNT/A-RBD R1156 and SV2C-LD F563. f, The putative _ peptides that were used for the complex inhibition studies. 
long-range electrostatic interactions of BONT/A-RBD R1294. R1294 is not 
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Extended Data Figure 4| Ky determination of wild-type and mutant 
BoNT/A-RBD and SV2C-LD proteins. The affinities of wild-type and mutant 
SV2C-LD for BoNT/A-RBD were determined by fluorescence anisotropy 
titration of labelled SV2C-LD (a, ¢, f) and subsequent displacement with 
unlabelled SV2C-LD (b), SV2C-LD F563A (d) or SV2C-LD N559A 

(g). Alternatively, the affinities of SV2C-LD mutants were calculated from the 


apparent Kq of labelled SV2C-LD in the presence (green) or absence (red) 
of SV2C-LD F563A (22.6 UM, e) or SV2C-LD N559A (18.2 UM, h). The 
affinities of BONT/A-RBD R1294A (i, j), BONT/A-RBD R1156E (k, 1) and 
BoNT/A-RBD T1145A/T1146A (m, n) for SV2C-LD were determined 
accordingly by anisotropy displacement titrations. For values, see Extended 
Data Table 3. 
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Extended Data Figure 5 | The BoNT/A-A2 peptide inhibits the GFP-BoNT/A-RBD or GFP only (200 nM, 10 min) and stained for the 
internalization of BONT/A-RBD by striatal neurons. Representative neuronal marker tubulin-83 (TUBB3) and endogenous VGAT to label 
images of GFP-BoNT/A-RBD uptake by cultured striatal neurons (DIV18). presynaptic terminals. Scale bar, 50 j1m. Representative images are from a total 
Neurons were pre-incubated with the GST-BoNT/A-A2 or the GST of 20 images per group, N = 2 independent experiments. 


control (5 1M, 15 min) in high K* or control buffer and treated with 
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Extended Data Figure 6 | The BoNT/A-A2 peptide inhibits the binding of 
BoNT/A-RBD in HEK293T cells. a, A typical example of BoONT/A-RBD 
(green) binding to SV2C-Flag (blue) in HEK293T cells. Cells were transfected 
with SV2C-Flag and mRFP (red) to highlight transfected cells, fixed and 
stained using Flag- and His-tagged antibodies. DAPI (4’,6-diamidino-2- 
phenylindole, purple) was used for visualizing all cell nuclei. Scale bar, 50 um. 
b, Quantification of BoNT/A-RBD binding in SV2C-Flag expressing 
HEK293T cells. Cells were transfected with SV2C-Flag and mRFP, or 

mREFP only, and incubated with GST-BoNT/A-A2 peptide or GST control 
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(5 uM, 15 min) and treated without (control) or with BoNT/A-RBD (100 nM, 
10 min). The total area of BONT/A-RBD was normalized to the total area of 
mRFP-transfected cells. No BoNT/A-RBD: 1.24 + 0.64%, n = 8; no 
SV2C-Flag: 0.82 + 0.64%, n = 5; no GST-BoNT/A-A2: 42.13 + 7.44%, n = 9; 
5 uM GST-BoNT/A-A2: 6.92 + 1.93%, n = 9; 5 uM GST: 24.55 + 8.01%, 
n= 9; Mann-Whitney U test, *** P< 0.001, * P< 0.05, NS not significant, 
n= number of images analysed per group, N = 2 independent experiments. 
All tests were performed two-sided. Data are presented as the mean + s.e.m. 
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Extended Data Table 1 


Data collection and refinement statistics 


Extended Data Table 1 Data collection and refinement statistics 


Data collection 
Space group 
Cell dimensions 

a, b, c(A) 

a& By (°) 
Resolution (A) 
Fineas 
CCip 
Tol 
Completeness (%) 
Redundancy 


Refinement 

Resolution (A) 

No. reflections 

Ryork/ Riree 

No. atoms 
Protein 
Ligand/ion 
Water 

B-factors 
Protein 
Ligand/ion 
Water 

R.m.s deviations 
Bond lengths (A) 
Bond angles (°) 


BoNT/A-RBD / SV2C-LD complex 
C 1 2 1 (twin operator -h,-k, 1) 


115.44, 105.26, 127.96 
90, 90.02, 90 

20-2.3 (2.4-2.3)* 
21.3 (102.8) 

99.1 (68.4) 

8.3 (2.1) 

99.8 (99.9) 

6.8 (6.7) 


20-2.3 
67843 
0.235/0.269 


9318 
8 
82 


38.9 
29.1 
20.0 


0.008 
1.079 


*The highest resolution shell is shown in parentheses. 
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Extended Data Table 2 | SV2C-LD-BoNT/A-RBD hydrogen bonds 


Residue (BoNT/A chain A) Distance (A) Residue (SV2C chain D) 
Thr 1146 [N] 3.1 Phe 557 [O] 
Thr 1146 [0G1] 3.6 Phe 557 [O] 
Met 1144 [N] 3.3 Cys 560 [0] 
Ser 1142 [N] 3.2 Phe 562 [O] 
Thr 1146 [0G1] 3.0 Phe 557 [N] 
Met 1144 [O] 3.5 Asn 559 [N] 
Thr 1145 [0G1] 3.0 Asn 559 [N] 
Thr 1145 [OG1] 3.5 Asn 559 [ND2] 
Tyr 1149 [OH] 2.8 Asn 559 [ND2] 
Met 1144 [O] 3.3 Cys 560 [N] 
Ser 1142 [O] 2.8 Phe 562 [N] 


The table summarizes the hydrogen bonds of the interface between chain A and chain D. 
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Extended Data Table 3 | Kg values of the interaction of wild-type and mutant proteins 


BoNT/A-RBD variant wt wt wt wt R1294A R1156E TT1145/6AA 
SV2C-LD variant wt wt, pH5 F563A F559A wt wt Wt 
Ka labeled [NM] 28+3 260+40 1645 2349 610430 12,800+500 37,000+2,000 
Ka untabeted [NM]? 260+20 1,400+100  1,700+400 400420 1,000+200 27,000 225,000 
Kg app [nM] 14347 500+70 
Ka untabeted ["M]” 2,800:1000  850#150 
Pull-down [interaction] strong moderate strong moderate moderate no 


Displacement with unlabelled protein. °In the presence of the unlabelled mutant SV2C-LD. Please note that the affinity of SV2C-LD for BONT/A-RBD R1156E and T1145A/T1146A is reduced to an extent that 
allows only determination of a lower boundary for the affinity of unlabelled SV2C-LD (Extended Data Fig. 41, n). However, these values are consistent with the reduced binding affinity of labelled SV2C-LD and the 
pull-down experiments. 
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Coupled GTPase and remodelling ATPase activities 
form a checkpoint for ribosome export 


Yoshitaka Matsuo!, Sander Granneman”**, Matthias Thoms!*, Rizos-Georgios Manikas', David Tollervey” & Ed Hurt! 


Eukaryotic ribosomes are assembled by a complex pathway that 
extends from the nucleolus to the cytoplasm and is powered by 
many energy-consuming enzymes’ *. Nuclear export is a key, irre- 
versible step in pre-ribosome maturation‘ *, but mechanisms under- 
lying the timely acquisition of export competence remain poorly 
understood. Here we show that a conserved Saccharomyces cerevi- 
siae GTPase Nug2 (also known as Nog2, and as NGP-1, GNL2 or 
nucleostemin 2 in human’) has a key role in the timing of export 
competence. Nug2 binds the inter-subunit face of maturing, nucleo- 
plasmic pre-60S particles, and the location clashes with the position 
of Nmd3, a key pre-60S export adaptor’’. Nug2 and Nmd3 are not 
present on the same pre-60S particles, with Nug2 binding before 
Nmd3. Depletion of Nug2 causes premature Nmd3 binding to the 
pre-60S particles, whereas mutations in the G-domain of Nug2 block 
Nimd3 recruitment, resulting in severe 60S export defects. Two pre- 
60S remodelling factors, the Real ATPase and its co-substrate Rsa4, 
are present on Nug2-associated particles, and both show synthetic 
lethal interactions with nug2 mutants. Release of Nug2 from pre- 
60S particles requires both its K*-dependent GTPase activity and 
the remodelling ATPase activity of Real. We conclude that Nug2 is 
a regulatory GTPase that monitors pre-60S maturation, with release 
from its placeholder site linked to recruitment of the nuclear export 
machinery. 

The conserved GTPase Nug2 (Extended Data Fig. 1) is associated 
with several pre-60S particles located in the nucleoplasm, but was not 
detected on particles with a known cytoplasmic location (see also 
Extended Data Fig. 2). The bacterial homologue of Nug2, YIqF, binds 
directly to ribosomal RNA!!, and we therefore used the ultraviolet 
cross-linking and analysis of complementary DNA (CRAC) method 
to localize the binding site for yeast Nug2 within the pre-60S particle”. 
Direct contacts for Nug2 were identified only with the 25S rRNA, at 
sites in helices H38, H69, H71, H80, H81-83, H84-86, H89, H91-92 
and H93 (Fig. la, c). Yeast three-hybrid analyses confirmed interac- 
tions between Nug2 and these rRNA helices (Fig. 1b). Mapping the 
major rRNA crosslink sites of Nug2 onto the 60S subunit structure 
(Fig. 1d) showed a distinct cluster on the inter-subunit joining surface™’. 
Nug2-binding sites overlap with regions occupied by the export factor 
Nmd3 in cryo-electron microscopy’®. CRAC was therefore applied to 
Nmd3 to identify its binding sites more precisely, which were found to 
lie in H38, H69 and H89 of 25S rRNA (Fig. la, c, e). Notably, the Nug2- 
and Nmd3-binding sites overlapped in H38, H69 and H89 (Fig. Ic, f), 
suggesting that the binding of these two proteins is mutually exclusive. 
To test this, pre-60S particles were purified with tagged Nug2 and 
shown to lack detectable Nmd3, and vice versa (Extended Data Fig. 2). 
Nmd3 is an essential nuclear export factor that recruits the export recep- 
tor Crm] to the nascent 60S subunits'*”’. These observations suggested 
that Nug2 acts as a ‘placeholder’ to prevent premature recruitment of 
Nmd3 to earlier, export-incompetent pre-60S particles. 

Like other GTP-binding proteins, Nug2 has characteristic G1, G3 
and G4 motifs in its G-domain (Fig. 2a and Extended Data Fig. 1), 


suggesting that GTP-binding or hydrolysis'* might regulate dynamic 
interactions between Nug2 and the pre-ribosome. Dominant-negative 
mutations were previously described in two GTPases involved in ribo- 
some biogenesis, the G1 motif of Lsg1 (Lys349Asn/Arg/Thr)”’ and the 
G3 motif of Nog] (Gly224Ala)'*. Orthologous G1- and G3-motif mutants, 
nug2(K328R) and nug2(G369A), respectively (Fig. 2a and Extended 
Data Fig. 1), each showed severe growth defect phenotypes (Fig. 2b), 
and were also dominant-negative when overexpressed in the presence 
of chromosomal NUG2 (Fig. 2c). Pre-ribosome analysis by sucrose gradient 
centrifugation showed that the Nug2(Lys328Arg) and Nug2(Gly369Ala) 
proteins were efficiently assembled into pre-60S subunits, but induced 
a ‘half-mer’ polysome phenotype (in particular for Nug2(Lys328Arg)), 
characteristic of reduced 60S subunit synthesis (Fig. 2d). The reduced 
60S levels were more apparent under low Mg”~ conditions that cause 
80S ribosomes to dissociate into 60S and 40S subunits (Fig. 2d). The 
nug2(K328R) and nug2(G369A) strains showed nuclear accumulation 
of an enhanced green fluorescent protein (eGFP)-containing RpL25- 
eGFP reporter, but not RpS3-eGFP, revealing a specific block in pre-60S 
nuclear export (Fig. 2e). We conclude that mutations in the GTPase 
domain of Nug2 allow recruitment to the pre-ribosomes, but block 
nuclear export. 

To determine the basis of the defects associated with Nug2(Lys328Arg) 
and Nug2(Gly369Ala), we assayed in vitro guanine-nucleotide-binding 
activity and GTP hydrolysis. Nug2 from S. cerevisiae was unstable when 
expressed in Escherichia coli (data not shown). By contrast, good yields 
were obtained for wild-type and mutant Nug2 from the eukaryotic 
thermophile Chaetomium thermophilum (ctNug2, ctNug2(Lys339Arg) 
and ctNug2(Gly380Ala), respectively; Fig. 2f), the thermostable proteins 
of which have superior biochemical properties’’. ctNug2 is highly homo- 
logous to yeast Nug2 (74% identity; Extended Data Fig. 1), and can 
complement, albeit not perfectly, a yeast nug2A mutant (Extended 
Data Fig. 3). As Nug2 may act as a potassium-dependent GTPase”, 
we tested the cation requirement for GTP hydrolysis. The GTPase 
activity of ctNug2 was low in NaCl-containing buffer, but was substan- 
tially stimulated by KCl (Fig. 2f). By contrast, ctNug2(Lys339Arg) 
and ctNug2(Gly380Ala) exhibited only background GTPase activity 
(Fig. 2f). In binding assays, wild-type ctNug2 and ctNug2(Gly380Ala) 
readily bound the fluorescent nucleotides MANT-GTP or MANT- 
GDP, whereas ctNug2(Lys339Arg) did not (Fig. 2g). We conclude that 
ctNug2(Lys339Arg) is defective in GTP binding, whereas ctNug2(Gly380Ala) 
binds but cannot hydrolyse GTP. This K* -stimulated GTPase activity 
might regulate the interaction of Nug2 with nascent 60S particles. 

Nugz2 is associated with nucleoplasmic pre-60S particles that also 
carry the Rix1-Ipil-Ipi3 heterotrimer, the dynein-related AAA-ATPase 
Real, and its co-substrate Rsa4 (Extended Data Fig. 2; see also below 
and ref. 21). The enzymatic activity of Real is required for the release of 
Ytm1 (ref. 22) and Rsa4 (ref. 21) and a genetic screen revealed synthetic 
lethality between the G1-motif mutant nug2(K328R) and the mutant 
alleles real-DTS and rsa4-1 (ref. 21) (Fig. 3a). We therefore investi- 
gated whether ATP-dependent remodelling of the Rix] particle by the 


1Biochemie-Zentrum der Universitat Heidelberg, Im Neuenheimer Feld 328, Heidelberg D-69120, Germany. “Wellcome Trust Centre for Cell Biology, The University of Edinburgh, Edinburgh EH9 3JR, UK. 
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Figure 1 | Nug2 binds to inter-subunit face of the pre-60S subunit clashing 
with export factor Nmd3. a, CRAC analyses of Nug2 and Nmd3 (performed 
twice; only sites were considered that were reproducibly found in both data 
sets). The total number of hits was plotted against the relative location along the 
rDNA. ETS, external transcribed spacer; ITS, internal transcribed spacer 
region. Ao, Ay, Az and D indicate RNA cleavage sites within the pre-rRNA unit. 
b, Yeast three-hybrid analysis revealing interaction between Nug2 and 


AAA-ATPase activity of Real (ref. 21) is altered in particles containing 
Nug2(Lys328Arg) or Nug2(Gly369Ala). Pre-60S particles carrying 
Flag-tagged RpL3 were affinity purified using a tandem affinity puri- 
fication (TAP)-tagging technique (Rixl-TAP) via IgG binding and 
tobacco etch virus (TEV) elution. The pre-60S particles were incubated 
in vitro to allow factor release, and then re-isolated on Flag beads via 
RpL3-Flag (Fig. 3b). Consistent with previous data”’, incubation of the 
pre-60S particles with ATP in Na*-containing buffer resulted in the 
release of Rsa4 and Real, but not Nug2 (Fig. 3c). By contrast, incuba- 
tion in K * -containing buffer caused the ATP-dependent release of Nug2, 
in addition to Rsa4 and Real (Fig. 3c). Incubation with GTP in Na*-or 
K*-containing buffer did not induce the release of biogenesis factors 
(Fig. 3d). However, neither Nug2(Lys328Arg) nor Nug2(Gly369Ala) 
could be dissociated from pre-60S particles after ATP treatment in K~ 
buffer (Fig. 3e). In the case of Nug2(Lys328Arg) (defective in GTP 
binding), incubation with ATP in K* buffer failed to release Rsa4, 
whereas pre-608S particles carrying Nug2(Gly369Ala) (defective in GIP 
hydrolysis) still showed Rsa4 release after ATP treatment (Fig. 3e). 
Mutation of one of the six ATP-binding protomers of Real (AAA2; 
real Lys659Ala) inhibited remodelling, including Nug2 release (Extended 
Data Fig. 4). These findings indicate that the GTP-binding activity of 
Nug2 influences the remodelling activity of the Real ATPase, whereas 
GTP hydrolysis is necessary for the final Nug2 release from the pre-60S 
subunit. 

In vitro, Real-dependent release of Rsa4 and Nug2 required only 
ATP and K* without the addition of GTP, whereas the mutational 
analyses suggested that GTPase activity is necessary for Nug2 release. 
These findings suggest that Nug2 on the Rixl particle might have 
retained bound GTP during purification (which is possible owing to 
its low intrinsic GTPase activity). Alternatively, ribosome-associated 
nucleotide diphosphate kinases can transfer the y-phosphate from 
ATP to GDP to generate GTP-loaded GTPases”. 


Overlapping 


identified 25S rRNA fragments. Negative control, empty vector and H25. 3AT, 
3-amino-1,2,4-triazole. c, Nug2- (yellow) and Nmd3- (green) binding sites 
identified by CRAC and highlighted in the indicated 25S rRNA. d, e, Mapping 
of CRAC Nug2- (yellow) and Nmd3- (green) binding sites on the 60S structure 
(PDB code 305H; ref. 13). f, Overlapping binding sites (red) of Nug2 (yellow) 
and Nmd3 (green). 


The pre-60S particles co-purified with Rix1 also contained small 
amounts of Ytm1 and Erb1 (Fig. 3c), which were previously described 
as nucleolar co-substrates for Real (ref. 22), and both were released by 
incubation with ATP in Na* or K* buffer (Fig. 3c). 

To determine the step in 60S subunit biogenesis at which dissoci- 
ation of Nug2 is disturbed in vivo, we affinity-purified different pre-60S 
particles from Nug2 wild-type and mutant cells using bait proteins that 
specifically enrich nucleolar, nucleoplasmic or cytoplasmic intermedi- 
ates (Fig. 4a). The nug2(K328R) mutation did not markedly alter the 
biochemical composition of most pre-60S particles tested. The exception 
was Arx1-associated particles, which showed a marked depletion of the 
export adaptor Nmd3 and the cytoplasmic factor Reil that stimulates 
recycling of Arx] (ref. 25) (Fig. 4a). Nmd3 was also largely absent from 
Arx] particles purified from Nug2(Gly369Ala) cells (Fig. 4b). 

To test the model that Nug2 depletion allows premature recruit- 
ment of Nmd3, we used an auxin-inducible degron system”’. Nug2 was 
expressed as a fusion protein (sAid~Nug2-sAid) with two copies of the 
sAid (small auxin-inducible degron) tag, which is targeted by the 
F-box E3 ubiquitin ligase TIR1 in the presence of auxin, inducing fast 
proteasomal degradation”® (Extended Data Fig. 5). Nmd3 is normally 
not detected on Rix1-associated particles, but was prematurely recruited 
to this pre-60S particle after Nug2 depletion (Fig. 4c). Concomitant 
with Nmd3 association, the recovery of Real and Rsa4 decreased dur- 
ing Nug2 depletion (Fig. 4c). We conclude that Nug2 promotes the 
stable association of Rsa4 and Real with the Rix] particles, while 
blocking premature recruitment of Nmd3. 

To address the timing of Nug2 recruitment to pre-60S particles in 
comparison to Real, Rsa4 and the Rix1-Ipil-Ipi3 complex, we used a 
combination of affinity purification and immunodepletion. Affinity 
purification of Nug2-TAP yielded a mixture of different pre-60S part- 
icles including Rix1/Real particles. Rixl-Flag immunoprecipitation 
was used to deplete Rix1/Real particles from this mixture, leaving 
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Figure 2 | K*-dependent GTPase activity of Nug2. a, Domain organization 
of Nug2. C, carboxy; N, amino. b, Complementation of nug2J4 cells by NUG2, 
nug2(K328R) and nug2(G369A) on YPD plates. c, Repression (+ doxycycline) 
and overexpression (— doxycycline) of NUG2, nug2(K328R) and nug2(G369A) 
in NUG2 cells. d, Polysomal (10 mM MgCl; top) and ribosomal (1.5 mM 

MgCl,; bottom) profiles of Nug2, Nug2(Lys328Arg) and Nug2(Gly69Ala) cells 
analysed by sucrose gradient centrifugation. Western analysis of gradient 

fractions using antibodies against Nug2 and RpL35. e, Subcellular distribution 


Nug2 particles that contained Rsa4 and several intermediate pre-60S 
factors including Nogl, Arxl, Nugl, Nop53, Nsa3, Rpf2, Rlp7 and 
Nsa2 (Fig. 4d). However, this Nug2 particle lacked other (further 
upstream) pre-60S factors such as Ytm1, Erb1 and Has1, suggesting 
that it corresponds to the precursor particle to which Nug2 was recruited. 
These data complement previous findings that Nug2 is the last “B-factor’ 
to associate with pre-ribosomes after dissociation of Has1 (ref. 27). 


+5-FOA 
NUG2 


NaCl KCl 


Nug2 = = = 


Rsa4 — 
(2) Flag beads 
Nsalk—_— 


Po = 


—Flag 


Figure 3 | Nug2 release from pre-60S particles requires intrinsic K*- 
dependent GTPase and Real ATPase activity. a, Synthetic lethality between 
alleles rsa4-1 (ref. 21) or real-DTS”' and nug2(K328R) revealed by growth on 
5-fluoroorotic acid (5-FOA). b-e, ATP-dependent release of Rsa4 and 

Nug2 from purified pre-60S particles. Scheme of the release assay (b) and 
experimental analyses (c-e). Affinity-purified Rixl particles carrying wild-type 
or mutant Nug2 were incubated with ATP or GTP in NaCl or KC] buffer, before 
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of RpL25-eGFP and RpS3-eGFP in NUG2 and nug2 mutant cells analysed 
by fluorescence microscopy. DIC, differential interference contrast; mRFP, 
monomeric red fluorescent protein. f, GTPase activity of purified ctNug2 
(SDS-PAGE; left) analysed by thin-layer chromatography/autoradiography 
(middle). Ratio of hydrolysed phosphate/total GTP plotted against time (right). 
g, Binding of fluorophores MANT-GTP (left) and MANT-GDP (right) to 
purified wild-type and mutant ctNug2. GTPase and binding assays were 
performed twice yielding highly reproducible data sets. 


As outlined in Fig. 4e, we propose that a previously uncharacterized 
step in the reorganization of the evolving pre-60S subunit primes it 
for nuclear export. This involves a regulatory GTPase Nug2 that over- 
laps with the binding site for the essential nuclear export adaptor Nmd3. 
As long as intranuclear maturation is incomplete, the pre-60S subunit 
cannot be exported, because recruitment of this essential export factor is 
not possible. However, a late nucleoplasmic remodelling step, catalysed 
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matured pre-60S particles were re-isolated via RpL3-Flag affinity-purification. 
Final eluates were analysed by SDS-PAGE and Coomassie staining (top; 
indicated bands were identified by mass spectrometry) and western blotting 
using the indicated antibodies (bottom) (c-e). CBP, calmodulin-binding 
peptide. All in vitro assays were performed at least twice with highly 
reproducible data sets. 
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Figure 4 | Nug2 release from the pre-60S subunit is linked to Nmd3 
recruitment. a, Affinity-purification of the indicated TAP-tagged pre-60S 
factors from NUG2 or nug2(K328R) mutant cells. Asterisks denote position of 
Reil identified by mass spectrometry. WT, wild type. b, Affinity-purification 
of Arx1-TAP from NUG2, nug2(K328R) and nug2(G369A) cells. r-protein 
denotes ribosomal proteins. ¢, Affinity-purification of Rixl-TAP from 
sAid—-Nug2-sAid degron strain after time-dependent auxin treatment. 


by the AAA-ATPase Real and its co-factor Rsa4, restructures the pre- 
60S particle, which could lead to both an rRNA and assembly factor 
rearrangement. This conformational change could also stimulate the 
K*-dependent GTPase activity of Nug2, thereby triggering its release 
from the matured pre-60S particles. We suggest that the Nug2 GTPase 
acts as molecular switch to proofread pre-ribosome maturation and 
regulate the acquisition of export competence. After this reorganiza- 
tion step, the binding site for Nmd3 becomes accessible on the pre-60S 
subunit, which further triggers Crm1 and RanGTP recruitment to generate 
nuclear export competence. Thus, our data indicate coordination between 
a remodelling AAA-ATPase and a conformation-sensing GTPase. 

The human Nug2 orthologue GNL2 is highly expressed in prolif- 
erating cells including cancer cells, and is involved in the control of cell 
cycle progression”’. The discovery of the role of Nug2 during surveillance 
of ribosome biogenesis may help to reveal the molecular mechanisms 
by which nucleostemin family members interconnect the elementary 
cellular processes of ribosome biogenesis and cell proliferation. 


METHODS SUMMARY 

Materials and methods for TAP purification, CRAC analysis, purification of ctNug2, 
GTPase and guanine nucleotide binding assays are described in detail in the Methods. 
Yeast strains and plasmids used in this study are described in Extended Data Tables 1 
and 2. Adapters used for the CRAC analysis are described in Extended Data Table 3. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 


Yeast strains and genetic methods. The S. cerevisiae strains used in this study are 
listed in Extended Data Table 1. Gene disruption and C-terminal tagging were 
performed as previously described”. 

Plasmid constructs. All recombinant DNA techniques were performed according 
to standard procedures using E. coli DH5z for cloning and plasmid propagation. 
Site-directed mutagenesis was performed by overlap-extension PCR. All cloned 
DNA fragments generated by PCR amplification were verified by sequencing. 
Plasmids used in this study are listed in Extended Data Table 2. 

CRAC analysis. The CRAC experiments were performed as described’* using the 
Nug2- and Nmd3-HTP (Hisg-TEV-ProtA) strain. CRAC data were processed 
using pyCRAC (S. Webb, R. D. Hector, G. Kudla and S.G., manuscript submitted). 
Cells were ultraviolet-irradiated in the Megatron UV chamber' at a dose of 
1.6Jcm ” and processed as described'?*!. The cDNAs from the Nug2 CRAC data 
were cloned into pCR4-TOPO (Invitrogen), and inserts were sequenced by Sanger 
sequencing. The cDNAs originating from Nmd3 CRAC experiments were sequenced 
on the Illumina MiSeq system (single-end 50b), according to manufacturer’s pro- 
cedures. The MiSeq CRAC data were processed using the pyCRAC software suite 
(S. Webb, R. D. Hector, G. Kudla and S.G., manuscript submitted; https://bitbucket. 
org/sgrann/pycrac). To remove potential PCR duplicates, the Nmd3 MiSeq data 
was collapsed using pyFastqDuplicateRemover. Reads subsequently mapped to 
the yeast genomic reference sequence (version 2008) using novoalign (http:// 
www.novocraft.com). Plots of reads aligned to the 35S reference sequence were 
generated using pyPileup and GNUplot. Adapters using this experiment are listed 
in Extended Data Table 3. 

Expression and purification of ctNug2. The gene encoding C. thermophilum 
Nug2 (UniProtKB/TrEMBL accession: GOSBX1_CHATD) was cloned from cDNA 
by standard procedures as recently described'’. Subsequently, the ctNug2 was inserted 
into yeast or E. coli expression plasmids (see below). Because the C-terminal exten- 
sion of ctNug2 (511-627 amino acids) is not conserved (Extended Data Fig. 1), 
ctNug2 from 1 to 510 amino acids was cloned into pET21 vector for the in vitro 
experiments. ctNug2 was expressed by using pET-ctNug2-510-Hisg plasmid in 
E. coli Rosetta-DE3 cells. Transformed cells were grown at 23 °C in LB medium 
until they reached an absorbance at 600 nm (Agoonm) Of 0.6, isopropyl-B-p- 
thiogalactoside (IPTG) was added to a final concentration of 0.1mM. The cells 
were grown for an additional 3 h and then collected by centrifugation and stored 
frozen at —80 °C. Frozen pellets were resuspended in buffer KCly99 (50 mM Tris, 
pH 8.0, 200 mM KCl, 5% glycerol, 0.01% NP-40 and 2mM {-mercaptoethanol) 
with protease-inhibitor cocktail, and were broken by sonication (BANDELIN 
sonopuls 3200 with TITANTELLER TT13) on ice. Sonication was performed 
under these conditions: amplitude: 50%, 3s on, 8s off, processed for 10 min. 
The lysate was centrifuged at 39,000g for 30 min at 4 °C. The supernatant fraction 
was applied to a SP-sepharose column, and washed with buffer KCloo. ctNug2- 
Hisg was eluted by buffer KCl99 containing 300 mM KCL. Next, the eluate fraction 
was applied to a Ni-NTA column, and the column was washed with buffer 
KClyo9. ctNug2-His, was eluted with buffer KCl9 containing 250 mM imidazole, 
before it was finally dialysed against buffer KChoo. 

Measurement of GTPase activity by single-turnover reactions. The GTPase 
activity experiments were performed as previously described. ctNug2, 
ctNug2K339R or ctNug2G380A (1 |1M) were incubated with a final concentration 
of 0.1uM GTP containing 750nCi of [y-°?P]-labelled GTP in buffer KCl3o9 
(50mM Tris, pH 8.0, 300mM KCl, 10mM MgCl, and 1mM dithiothreitol 
(DTT)) or in buffer NaClzo9 (50 mM Tris, pH 8.0, 300 mM NaCl, 10 mM MgCl, 
and 1 mM DTT) for the indicated time at 30 °C. After the reaction, the hydrolysed 
y-phosphate was separated by thin-layer chromatography. 
Guanine-nucleotide-binding experiments. ctNug2, ctNug2K339R or ctNug2G380A 
(1 uM) were incubated with 0.1 uM MANT-GTP or MANT-GDP in buffer KCl3o9 
(50mM Tris, pH8.0, 300mM KCl, 60mM MgCl, 20mM EDTA and 1mM 
DTT). MANT-GTP or MANT-GDP are analogues of natural GTP or GDP, where 
either the ribose 2'-hydroxy or the 3'-hydroxy group has been esterified by the 
fluorescent methylisatoic acid with an excitation/emission = 355/448 nm. The 
fluorescence quantum yield of MANT fluorophore is very low in water and 
increases significantly in non-polar solvents or after binding to most proteins. 
This highly environmental sensitive fluorescence of MANT makes MANT- 
GTP/GDP useful for directly detecting the nucleotide-protein interactions. 
Accordingly, it was excited at 355 nm with a xenon lamp, and emission spectra 
were recorded between 385 and 600nM with a 5-nm increment step using a 
Synergy 4 spectrophotometer (BioTek). 

In vitro release assay. The Rix] particle was affinity purified using IgG beads via 
Rix1-TAP from yeast strain (Rix1-TAP, RpL3-Flag) expressing NUG2, nug2(K328R) 
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and nug2(G369A) followed by TEV protease cleavage at 4 °C to release the Rix] 
particle. The TEV eluate (that is, the released Rix] particle) was incubated with 
4mM ATP or 4mM GTP at 23 °C for 1h. After ATP or GTP treatment, the 60S 
particle was re-purified via affinity purification of RpL3-Flag using Flag beads. 
Buffer KCljo9 (50 mM Tris, pH 8.0, 100 mM KCl, 10 mM MgCl, and 1 mM DTT) 
or buffer NaClioo (50 mM Tris, pH 8.0, 100 mM NaCl, 10 mM MgCl and 1mM 
DTT) were used. Affinity purifications were performed as previously described’. 
Immunodepletion of Rix1 by Flag immunoprecipitation. The Nug2 particle 
was affinity purified from yeast strain (Nug2-TAP, Rix1-Flag) via IgG beads. The 
TEV eluate was incubated twice with Flag beads at 4 °C for 30 min each to deplete 
the Rixl-associated Nug2 particle. The flow-through was used for the final calmo- 
dulin purification step. 

Miscellaneous. Further methods used in this study and previously described were 
TAP purifications of pre-60S particles*, sucrose gradient analysis to obtain ribo- 
somal and polysomal profiles®, ribosomal export assays using the large subunit 
reporter Rpl25-eGFP monitored by fluorescence microscopy** and yeast three- 
hybrid analysis*’. Antibodies used for western analysis in the following dilutions 
were anti-Nug2 (ref. 38) 1:10,000; anti-Rsa4 (ref. 39) 1:10,000; anti- Nmd3 (ref. 15) 
1:5,000; anti-Mex67/Mtr2 (ref. 40) 1:10,000; anti-RpL35 (ref. 41) 1:35,000; anti- 
RpL3 (ref. 42) 1:5,000; anti-Nsa2 (ref. 43) 1:10,000; anti-RpL10 (ref. 44) 1:1,000; 
anti-Nogl (ref. 38) 1:30,000; anti-Reil (ref. 45) 1:5,000; goat-anti-mouse 1:3,000 
(170-6516) and mouse-anti-rabbit horseradish-peroxidase-conjugated antibodies 
1:3,000 (170-6515, both BioRad). Page ruler unstained protein ladder (Thermo 
Scientific) was used as a protein marker; Brilliant Blue G-Colloidal Concentrate 
Electrophoresis Reagent (Sigma-Aldrich) was used for Coomassie stain, and 
4-12% NuPAGE Bis-Tris gels (Novex) together with NuPAGE MOPS SDS run- 
ning buffer (Invitrogen) were used for SDS-PAGE. Wild-type yeast strain W303 
(ref. 46) and the yeast three-hybrid vector p3A-MS2-1 (ref. 47) were used. 
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Extended Data Figure 1 | Multiple sequence alignment of various Nug2 
orthologues. Multiple sequence alignment of YlqF (bacterial homologue of 
Nug2; Bacillus cereus), ctNug2, DmNug2 (Drosophila melanogaster), DrNug2 
(Danio rerio), HsNug2 (Homo sapiens), KINug2 (Kluyveromyces lactis), 
MmNug2 (Mus musculus), ScNug2 (S. cerevisiae), SpNug2 (S. pombe), XINug2 
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(Xenopus laevis) and YINug2 (Yarrowia lipolytica), using T-Coffee multiple are indicated. 
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sequence alignment (http://www.ebi.ac.uk/Tools/msa/tcoffee) and Jalview. 
Indicated above the alignment are the different Nug2 domains including the N, 
G, and C domains and the C-terminal extension. Moreover, the DAR, G1, G3, 
G4 motifs, point mutation sites (in red) and truncated site of ctNUG2-510 
amino acids (truncation of the non-conserved C-terminal extension; red line) 
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Extended Data Figure 2 | Nug2 and Nmd3 are not found on the same pre- 
60S particles. Indicated different TAP-tagged bait proteins were affinity 
purified from yeast wild-type cells. The final eluates were analysed by SDS- 
PAGE and Coomassie staining (top), and by western blotting using the 
indicated antibody (bottom). Asterisks mark the position of each bait protein. 
Real has been identified by mass spectrometry. All affinity purifications and 
western analyses were performed at least twice, yielding highly reproducible 


data sets. 
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30 °C 37°C 
SDC -Leu SDC + FOA SDC -Leu SDC + FOA 


YCplac111 (KF) 

YCplact 11-Papx-ScNug2 : 
YCplac111-Papu-ctNug2 ees 
YCplac111-Papy-ctNug2-510aa 
PRS425-Papyu-Flag-ctNug2-510aa rx) 


Extended Data Figure 3 | ctNug2 can complement the lethal phenotype ofa _ Fig. 1) under the control of the constitutive ADH promoter in a single- 
nug2A null mutant. Serial dilutions of the yeast Nug2 shuffle strain (MATa, copy-number (YCplacl11) or multi-copy-number (pRS425) plasmid (see 


ade2, ade3, his3, ura3, leu2, trp1, nug2::kanMX4, pHT4467-NUG2) Supplementary Table 2) were spotted on SDC—Leu (loading control) and SDC 
transformed with empty plasmid, yeast SCNUG2, ctNUG2 or ctNUG2-510 plates containing 5-FOA at indicated temperatures for 6 days. Note that ctNug2 
(truncation of the non-conserved C-terminal extension; see Extended Data only partially complements the nug2 null mutant. 
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Extended Data Figure 4 | Mutations in ATP-binding or MIDAS domain of — (Agoo nm = 0.75) to galactose medium (YPG) for 7 h. Rix] particles, which were 


Real inhibit the release of Rsa4 and Nug2 from the pre-60S particle. affinity purified from a Rix1-TAP, RpL3-Flag strain containing either 
a, b, Wild-type REAJ and the real mutants mapping in the ATP-binding siteof | endogenous wild-type or overexpressed wild-type eGFP-Real, eGFP- 
the AAA2 domain (Lys659Ala) or in the MIDAS domain (DAA)”! were Real(DAA) and eGFP-Real (Lys659Ala), were incubated with or without 


N-terminally tagged with eGFP and expressed in a REAI shuffle strain (a) or | 4mM ATP in KCl buffer, before the different in vitro matured pre-60S particles 
overexpressed under the control of the inducible GAL1-10 promoter in REAI __ were re-isolated by affinity-purification via the RpL3-Flag on Flag beads. 


wild-type strain DS1-2b (b). Transformants were spotted in tenfold serial Subsequently, the in vitro matured pre-60S particles (eluates) were analysed by 
dilution steps on the indicated plates and incubated at 30 °C for 3 days. Both of | SDS-PAGE and Coomassie staining. Relevant bands are indicated on the right. 
the real mutant alleles do not complement the rea1 null strain (a, SDC + Note that in the case of the real mutants, the release of Nug2, Rsa4 but also Real 
5-FOA) and cause a dominant-negative phenotype after overexpression by and the Rix1 complex is significantly inhibited. All in vitro assays were 
replacing endogenous Real] (b, galactose). c, Overnight pre-cultures were performed at least twice, yielding highly reproducible data sets. 


grown in SRC—Leu to prevent plasmid loss, followed by shifting cells 
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Extended Data Figure 5 | Nug2 depletion assay using the auxin-inducible 
degron system. a, Growth of Nug2 auxin degron strains (sAid-Nug2-sAid) in 
the Papn-OsTIR1 background on YPD plates with or without 500 uM auxin 
(IAA). The cell growth of sAid-Nug2-sAid strain was inhibited by the addition 
of auxin. b, Western blotting of sAid-Nug2-sAid after auxin treatment. The 
depletion of sAid~Nug2-sAid occurred within about 30 min of auxin addition. 
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Extended Data Table 1 | Yeast strains used in this study 


Name Genotype Source 
W303 Mata, ade2-1, his3-11,15, leu2-3,112, trp1-1, ura3-1, can1-100 ref.*° 
DS1-2b MATa, his3-A200, leu2-A1, trp1-A63, ura3-52 ref.”! 
Nug2-HTpA W303, Mata, NUG2-HTpA::His3MX6 This work 
Nmd3-HTpA W303, Mata, NUD3-HTpA: :His3MX6 This work 
Mata, his3-200, ura3-52, leu2-3, 112, trpl-1, ade2, LYS2::(lexAop)- ae 
pen eee HIS3, LexA-MS2 coat (TRP/) ic 
Nug2 Shuffle strain Mata, his3, ura3, leu2, trp1, lys2, nug2::kanMX4, pRS416-NUG2 ref.*> 
Nug2 Shuffle strain ( for Mata, ade2, ade3, his3, ura3, leu2, trp1, nug2::kanMX4 , pHT4467- ae 
: ref.” 
ctNug2 complementation) NUG2 
F Mata, his3, trp1, leu2, ura3, LYS2, ADE2, ADE3, NMD3::His3MX6, 14 
Nmd3 Shuffle strain pRS316 (URA3) Nmd3, ref. 
: Mata, his3, ura3, leu2, trp1, lys2, nug2::kanMX4, SSF1-TAP::TRP1, ; 
Nug2 Shuffle strain, Ssf1-TAP pRS416-NUG2 This work 
: Mata, his3, ura3, leu2, trp1, lys2, nug2::kanMX4, NSA1-TAP::TRP1, : 
Nug2 Shuffle strain, Nsal-TAP pRS416-NUG2 This work 
Meee Mata, his3, ura3, leu2, trp1, lys2, nug2::kanMX4, RIX1-TAP::TRP1, : 
Nug2 Shuffle strain, Rix1-TAP pRS416-NUG2 This work 
: Mata, his3, ura3, leu2, trp1, lys2, nug2::kanMX4, ARX1-TAP::TRP1, ? 
Nug2 Shuffle strain, Arx1-TAP pRS416-NUG2 This work 
; Mata, his3, ura3, leu2, trp1, lys2, nug2::kanMX4, LSGI-TAP::TRP1, : 
Nug2 Shuffle strain, Lsg1-TAP pRS416-NUG2 This work 
Nug2 Shuffle strain, Rixl-TAP, Mata, his3, ura3, leu2, trp!, lys2, nug2::kanMX4, RIXI-TAP::TRP1, aiemmat 
L3-Flag RPL3-FLAG::natNT2, pRS416-NUG2 
Rix1-TAP, L3-Flag DS1-2b, Mata, RIXI-TAP::TRP1, RPL3-FLAG::natNT2 This work 
Real shuffle strain W303 Mata real::kanMX6 YCG-YLR106c ref.”! 
. Mata, his3, ura3, leu2, trp!, lys2, nug2::kanMX4 , rsa4::His3MX4, : 
Nug2 Rsa4 double shuffle strain pRS416-NUG2, pRS316-RSA4 This work 
Aug2 Real double <tiniite strain Mata, his3, ura3, leu2, trp1, lys2, nug2::kanMX4, real ::kanMX4, This work 


pRS416-NUG2, pRS316-REA1 
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Extended Data Table 2 


Name 

pACTII-NUG2 
p3A-MS82-1 
p3A-MS2-25SH25 
p3A-MS2-25SH38 
p3A-MS2-25SH69-71 
p3A-MS2-25SH89 
p3A-MS2-25SH90-92 
pCM185-NUG2 
pCM185-nug2K328R 
pCM185-nug2G369A 
pCM190-NUG2 
pCM190-nug2K328R 


pCM190-nug2G369A 
pRS313-NUG2 
pRS313-nug2K328R 
pRS313-nug2G369A 
pRS315-NUG2-TAP 
pRS315-nug2K328R-TAP 
pRS314-NMD3-TAP 
YCplac111-GFP-REA1 
YCplac111-GFP-realDAA 
YCplac111-GFP-REAIK659A 
YCplac111G-GFP-REA1 
YCplac111G-GFP-realDAA 
YCplac111G-GFP-REA1K659A 
YCplac111-PapH-NUG2 
YCplac111-PapH-ctNUG2 
YCplac111-Papy-ctNug2-510 


PpRS425-P apy -Flag-ctNUG2-510 
pRS314-RFP-NOPI - RPL25-GFP 


pRS314-RFP-NOPI - RPS3-GFP 
pRS314-rsa4-1 
YCplac22-realDTS 


pRS313-sAid-Nug2-sAid, Papy-OsTIR1 


pET-ctNug2-510-His¢ 
pET-ctNugK339R-510-His¢ 
pET-ctNug2-G380A-510-His6 


Plasmids used in this study 


Features 

2u, LEU2, P apm, Tapa, G4AD-NUG2 

2u, URA3, ADE2, P pou, MS2 sites, T poi 

2u, URA3, ADE2, P poi, MS2-25SH25 sites, T pou 
2u, URA3, ADE2, P po, MS2-25SH38, T pon 
2u, URA3, ADE2, P pou, MS2-25SH69-71, T pout 
2u, URA3, ADE2, P poi, MS2-25SH89, T pour 
2u, URA3, ADE2, P pou, MS2-25SH90-92, T pout 
CEN, TRP1, P je~7-NUG2 

CEN, TRP1, P ewo7-nug2K328R 

CEN, TRP1, P tero7-nug2g369A 

2u, URA3, P eo7-NUG2 

2u, URA3, Piewo7-nug2K328R 

2u, URA3, Preio7-nug2G369A 

CEN, HIS3, NUG2 

CEN, HIS3 , nug2K328R 

CEN, HIS3, nug2G369A 

CEN, LEU2, NUG2-TAP 

CEN, LEU2, nug2K328R-TAP 

CEN, TRP1, NMD3-TAP 

CEN, LEU2, GFP-REAI 

CEN, LEU2, GFP-realDAA 

CEN, LEU2, GFP-realK659A 

CEN, LEU2, PGari-10-GFP-REA1 

CEN, LEU2, PGari-10-GFP-realDAA 

CEN, LEU2, PGari-10-GFP-real K659A 

CEN, LEU2, Papni-NUG2 

CEN, LEU2, P spii-ctNUG2 

CEN, LEU?2, P apii-ctNUG2-510 (1-510aa) 


2u, LEU2, P apmi-ctNUG2-510 (1-510aa) 
CEN, TRP1, mRFP-NOP1, RPL25-eGFP 


CEN, TRP1, mRFP-NOP1, RPS3-eGFP 

CEN, TRP1, rsa4-1 

CEN, TRP1, real-DTS 

CEN, HIS3, sAid-NUG2-sAid, Papni-OsTIR1-myco 

E. coli expression vector, Amp, ctNUG-His¢(1-510aa) 

E. coli expression vector, Amp, ctnug2K339R-His 6 (1-510aa) 


E. coli expression vector, Amp, ctnug2G380A-His¢ (1-510aa) 
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This study 

This study 

This study 

This study 
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This study 

This study 

This study 

This study 

This study 

This study 

This study 

This study 
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Ref.”! & This Study 
Ref.”! & This Study 
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This study 


This study 
Ref.7! 


Ref.?! 

Ref.?! 

Ref.”! & This Study 
This study 

This study 

This study 


This study 


LETTER 


Extended Data Table 3 | Adapters used for the CRAC experiments 


Name 

5' linkers: 
Nug2 CRAC 
Nmd3 CRAC 1 
Nmd3 CRAC 2 
3' linker: 
miRCat 


PCR oligos used: 
P5_ forward 


Pairedendmircat 
Reverse 


Sequence 


5'-invddT-GTTCArGrArGrUrUrCrUrArCrArGrUrCrCrGrArCrGrArUrC-OH-3' 
5'-invddT-ACACrGrArCrGrCrUrCrUrUrCrCrGrArUrCrUrNrNrNrUrCrUrCrUrArGrC-OH-3' 


5'-invddT-ACACrGrArCrGrCrUrCrUrUrCrCrGrArUrCrUrNrNrNrCrArCrUrArGrC-OH-3' 


5'-AppTGGAATTCTCGGGTGCCAAGddC-3' 


5'-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT-3' 


5'-CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTG- 
GCCTTGGCACCCGAGAATTCC-3'. 


5'-invddT denotes an inverted dideoxy thymidine; N denotes random nucleotide sequences. 
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N°-methyladenosine-dependent regulation of 


messenger RNA stability 


Xiao Wang’, Zhike Lu', Adrian Gomez!, Gary C. Hon?, Yanan Yue’, Dali Han’, Ye Fu’, Marc Parisien’, Qing Dail, Guifang Jial4, 


Bing Ren’, Tao Pan? & Chuan He! 


N -methyladenosine (m°A) is the most prevalent internal (non-cap) 
modification present in the messenger RNA ofall higher eukaryotes'”. 
Although essential to cell viability and development*°, the exact role 
of m°A modification remains to be determined. The recent discovery 
of two m°A demethylases in mammalian cells highlighted the impor- 
tance of m°A in basic biological functions and disease™*. Here we 
show that m°A is selectively recognized by the human YTH domain 
family 2 (YTHDF2) ‘reader’ protein to regulate mRNA degradation. 
We identified over 3,000 cellular RNA targets of YTHDF2, most of 
which are mRNAs, but which also include non-coding RNAs, with a 
conserved core motif of G(m°A)C. We further establish the role of 
YTHDEF2 in RNA metabolism, showing that binding of YTHDF2 
results in the localization of bound mRNA from the translatable 
pool to mRNA decay sites, such as processing bodies’. The carboxy- 
terminal domain of YTHDF2 selectively binds to m°A-containing 
mRNA, whereas the amino-terminal domain is responsible for the 
localization of the YTHDF2-mRNA complex to cellular RNA decay 
sites. Our results indicate that the dynamic m°A modification is 
recognized by selectively binding proteins to affect the translation 
status and lifetime of mRNA. 

Messenger RNA is central to the flow of genetic information. Regu- 
latory elements (for example, AU-rich element, iron-responsive element), 
in the form of short sequence or structural motif imprinted in mRNA, 
are known to control the time and location of translation and degra- 
dation processes’. Reversible and dynamic methylation of mRNA 
could add another layer of more sophisticated regulation to the prim- 
ary sequence", m°A, a prevalent internal modification in the messen- 
ger RNA of all eukaryotes, is post-transcriptionally installed by m°A 
methyltransferase (for example, MT-A70, Fig. 1a) within the consensus 
sequence of G(m°A)C (70%) or A(m°A)C (30%)"?. The loss of MT-A70 
leads to apoptosis in human Hela cells’*, and significantly impairs 
development in Arabidopsis* and in Drosophila’. Our recent discoveries 
of m°A demethylases FTO (fat mass and obesity-associated protein)’ 
and ALKBH5* demonstrate that this RNA methylation is reversible 
and may dynamically control mRNA metabolism. The recently revealed 
m°A transcriptomes (methylome) in human cells and mouse tissues 
showed m°A enrichments within long exons and around stop codons'*"*, 
further suggesting fundamental regulatory roles of m°A. However, 
despite these progresses the exact function of m°A remains to be 
elucidated. 

Whereas methyltransferase may serve as the ’writer’ and demethy- 
lases (FTO and ALKBHS) act as the ‘eraser’ of m°A on mRNA, potential 
m°A-selective-binding proteins could represent the ‘reader’ of the m°A 
modification and exert regulatory functions through selective recog- 
nition of methylated RNA. Here, we show that the YTH-domain family 
member 2 (YTHDF2), initially found in pull-down experiments using 
m°A-containing RNA probes", selectively binds m°A-methylated 
mRNA and controls RNA decay in a methylation-dependent manner. 


The YTH domain family is widespread in eukaryotes and known to 
bind single-stranded RNA with the conserved YTH domain (>60% 
identity) located at the C terminus’*”. In addition to previously reported 
YTHDF2 and YTHDE3*, we also discovered YTHDF1 as another m°A- 
selective binding protein by using methylated RNA bait containing the 
known consensus sites of G(m°A)C and A(m°A)C versus unmethy- 
lated control (Extended Data Fig. 1a). Further, highly purified poly(A)- 
tailed RNAs were incubated with recombinant glutathione-S-transferase 
(GST)-tagged YTHDF1-3 and then separated by GST-affinity column. 
By using a previously reported liquid chromatography-tandem mass 
spectrometry (LC-MS/MS) method”*, we found that the m°A-containing 
RNAs were greatly enriched in the YTHDF-bound portion and dimin- 
ished in the flow-through portion (Fig. 1b and Extended Data Fig. 1b). 
Gel-shift assay revealed that YTHDF2 has a 16-fold higher binding 
affinity to methylated probe compared to the unmethylated one, as 
well as a slight preference to the consensus sequence (Extended Data 
Fig. 1c, d). This protein was selected for subsequent characterization 
because it has a high selectivity to m°A, and was thought to be assoc- 
iated with human longevity”. 
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Figure 1 | YTHDF? selectively binds m°A-containing RNA. a, Illustration of 
m°A methyltransferase, demethylase and binding proteins. RRACH is the 
extended m°A consensus motif, where R is G or A and H is not G. b, LC-MS/MS 
showing m°A enrichment in GST-YTHDF2-bound mRNA while depleted in 
the flow-through portion. Error bars, mean + s.d., n = 2, technical replicates. 
c, Overlap of peaks identified through YTHDF2-based PAR-CLIP and the 
m°A-seq peaks in the same cell line. d, Binding motif identified by MEME with 
PAR-CLIP peaks (P = 3.0 X 10 *°, 381 sites were found under this motif out 
of top 1,000 scored peaks). e, Pie chart depicting the region distribution of 
YTHDF2-binding sites identified by PAR-CLIP, TSS (200-bp window from 
the transcription starting site), stop codon (400-bp window centred on 

stop codon). 


1Department of Chemistry and Institute for Biophysical Dynamics, The University of Chicago, 929 East 57th Street, Chicago, Illinois 60637, USA. *Ludwig Institute for Cancer Research, Department of 
Cellular and Molecular Medicine, UCSD Moores Cancer Center and Institute of Genome Medicine, University of California, San Diego School of Medicine, 9500 Gilman Drive, La Jolla, California 92093-0653, 
USA. 3Department of Biochemistry and Molecular Biology and Institute for Biophysical Dynamics, The University of Chicago, 929 East 57th Street, Chicago, Illinois 60637, USA. “Department of Chemical 
Biology and Synthetic and Functional Biomolecules Center, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China. 
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We next applied two independent methods to identify RNAs that 
are the binding partners of YTHDF2: (1) photoactivatable ribounu- 
cleoside crosslinking and immunoprecipitation (PAR-CLIP)”’ to locate 
the binding sites of YTHDF2; (2) sequencing profiling of the RNA 
of immunopurified ribonucleoprotein complex (RNP) (RIP-seq)”° 
to extract cellular YTHDF2-RNA complexes. Approximately 10,000 
crosslinked clusters covering 3,251 genes were identified in PAR-CLIP 
(Extended Data Fig. 2a, b). Most are mRNA but 1% are non-coding 
RNA. Among 2,536 transcripts identified in RIP-seq, 50% overlap with 
PAR-CLIP targets (Extended Data Fig. 2b). We also performed m°A- 
seq for the poly(A)-tailed RNA from the same HeLa cell line and found 
that 59% (7,345 out of 12,442) of the PAR-CLIP peaks of YTHDF2 
overlap with m°A peaks (Fig. 1c). As shown in Fig. 1d, the conserved 
motif revealed from the top 1,000 scored clusters matches the m°A 
consensus sequence of RRACH’*", which strongly supports the bind- 
ing of m°A by YTHDF2 inside cells (see more motifs in Extended Data 
Fig. 2c-e). Coinciding with the previously reported pattern of m°A 
peaks'*”°, YTHDF2 PAR-CLIP peaks showed enrichment near the 
stop codon and in long exons (Extended Data Fig. 2f-h). YTHDF2 
predominantly targets the stop codon region, the 3’ untranslated 
region (3’ UTR), and the coding region (CDS) (Fig. le), indicating that 
YTHDF2 may have a role in mRNA stability and/or translation. 

To dissect the role of YTHDF2 we used ribosome profiling to assess 
the ribosome loading of each mRNA represented as ribosome-protected 
reads”, HeLa cells that were treated with YTHDF2 short interfering 
RNA (siRNA) (Extended Data Fig. 3a) as well as siRNA control were 
subsequently subjected to ribosome profiling with mRNA sequencing 
(mRNA-seq) performed on the same sample. Transcripts present 
(reads per kilobase per million reads (RPKM) > 1) in both ribosome 
profiling and mRNA-seq samples were analysed. These transcripts 
were then categorized as YTHDF2 PAR-CLIP targets (3,251), common 
targets of PAR-CLIP and RIP (1,277), and non-targets (3,905, absent 
from PAR-CLIP and RIP). A significant increase of input mRNA reads 
for YTHDF2 targets was observed in the YTHDF2 knockdown sample 
compared to the control (P < 0.001, Mann-Whitney U-test), without a 
noticeable change for non-targets (Fig. 2a). However, compared with 
the increase in mRNA level, the differences in the ribosome-protected 
fraction in the knockdown sample compared to the control were small 
(Fig. 2b). Thus, YTHDF2 knockdown led to apparently reduced trans- 
lation efficiency of its targets as a result of accumulation of non-translating 
mRNA (Extended Data Fig. 3b), suggesting the primary role of YTHDF2 
in RNA degradation. 

Next, we performed RNA lifetime profiling by collecting and ana- 
lysing RNA-seq data on YTHDF2 knockdown and control samples 
obtained at different time points after transcription inhibition with 
actinomycin D. Indeed, YTHDF2 knockdown led to prolonged (~30% 
in average) lifetimes of its mRNA targets in comparison with non- 
targets (Fig. 2c). Interestingly, we found that as the number of binding 
sites increase the stabilization of the RNA targets caused by YTHDF2 
knockdown also increase significantly’: more than four sites have a larger 
extent of stabilization upon YTDF2 knockdown than 2-4 sites, which 
have larger fold changes than targets with only one site (Fig. 2d and 
Extended Data Fig. 3c, Kruskal-Wallis test, P< 0.0001); however, trans- 
cripts grouped according to binding region show similar fold-change 
indistinguishable in statistical test (Extended Data Fig. 3c, d). 

Three pools of mRNAs exist in cytoplasm as defined by their engage- 
ment in translation”? (Fig. 2e): non-ribosome mRNPs (mRNA-protein 
particles, with sedimentation coefficients of 20S-35S in sucrose gradient), 
translatable mRNA pool associated with ribosomal subunits (40S-80S), 
and actively translating polysome (>80S). YTHDF2 was observed to 
be present in non-ribosome fraction (Fig. 2e). After YTHDF2 knock- 
down, a 21% increase of the m°A/A ratio of the total mRNA was 
observed (Fig. 2f), confirming that the presence of YTHDF2 destabi- 
lizes the m®°A-containing mRNA. YTHDF2 could affect localizing 
m°A-containing mRNA from a translatable pool to mRNPs. If so, 
the amount of methylated mRNA should decrease in mRNPs and 
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Figure 2 | YTHDF2 destabilizes its cognate mRNAs. a-c, Cumulative 
distribution of mRNA input (a), ribosome-protected fragments (b), and 
mRNA lifetime log, fold changes (A, c) between siYTHDF2 (YTHDF2 
knockdown) and siControl (knockdown control) for non-targets (grey), 
PAR-CLIP targets (blue), and common targets of PAR-CLIP and RIP (red). 
d, The mRNA lifetime log, fold changes were further grouped and analysed on 
the basis of the number of CLIP sites on each transcript. The increased binding 
of YTHDF?2 on its target transcript correlates with reduced mRNA lifetime. 
P values were calculated using two-sided Mann-Whitney or Kruskal-Wallis 
test (rank-sum test for the comparison of two or multiple samples, 
respectively). Detailed statistics are presented in Extended Data Fig. 3c. 

e, Western blotting of Flag-tagged YTHDF2 on each fraction of 10-50% 
sucrose gradient showing that YTHDF2 does not associate with ribosome. The 
fractions were grouped to non-ribosome mRNPs, 40S-808S, and polysome. 

f, Quantification of the m°A/A ratio of the total mRNA, non-ribosome portion, 
40S-80S, and polysome by LC-MS/MS. Noticeable increases of the m°A/A ratio 
of the total mRNA, mRNA from 40S-80S, and mRNA from polysome were 
observed in the siYTHDF2 sample compared to control after 48h. A reduced 
m°A/A ratio of mRNA isolated from the non-ribosome portion was observed in 
the same experiment. P values were determined using two-sided Student's t-test 
for paired samples. Error bars, mean + s.d., for poly(A)-tailed total mRNA 
input, 1 = 10 (five biological replicates x two technical replicates), and for 
the rest, n = 4 (two biological replicates X two technical replicates). 


increase in the translatable pool upon YTHDF2 knockdown. Indeed, 
after YTHDF2 knockdown, the m°A/A ratio of mRNA isolated from 
mRNPs showed a 24% decrease and the ratio from the translatable pool 
demonstrated a 46% increase (Fig. 2f). We also observed a 14% increase 
of the m°A/A ratio of mRNA isolated from polysome after YTHDF2 
knockdown (Fig. 2f), although it is worth noting that this model pro- 
vided no prediction of the behaviour of polysome because the ribo- 
some-loading number per transcript depends on the availability of 
both mRNA and free ribosomes. It should be also noted that the 
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observed m°A/A ratio change does not seem to result from the protein 
level change of methyltransferase and demethylase as detected by west- 
ern blotting (Extended Data Fig. 3e). 

Three YTHDF2-targeted RNAs were selected for further validation: 
the SON mRNA has multiple CLIP peaks in CDS, the CREBBP mRNA 
has CLIP peaks at 3’ UTR, and a non-coding RNA PLAC2 (Extended 
Data Fig. 4a—d). As detected by gene-specific PCR with reverse trans- 
cription (RT-PCR), after 48 h YTHDE2 knockdown, all three RNA trans- 
cripts increased by more than 60% with prolonged lifetime; both SON 
and CREBBP showed redistribution from non-ribosome mRNP to trans- 
latable pool (Extended Data Fig. 4e-n). Furthermore, knockdown of 
the known m°A methyltransferase MT-A70 led to noticeably reduced 
binding of YTHDF2 to its targets and increased stability of the targets 
similar to that of the YTHDF2 knockdown (Extended Data Fig. 5). 

To gain mechanistic understanding of the YTHDF2-mRNA inter- 
action, we analysed the cellular distribution of YTHDF2 and found 
that YTHDF2 co-localizes with three markers (DCP1la, GW182 and 
DDX6) of processing bodies (P bodies) in the cytoplasm, where mRNA 
decay occurs (Extended Data Fig. 6a-j)?*°. YTHDF2 is composed of 
a C-terminal RNA-binding domain (C-YTHDEF2) and a P/Q/N-rich 
N terminus (N-YTHDF2, Fig. 3a and Extended Data Fig. 6k)’”**. 
Whereas overexpression of YTHDF2 led to a reduced m°A/A ratio of 
the total mRNA, overexpression of either N-YTHDF2 or C-YTHDF2 
yielded an increased m°A/A ratio (Fig. 3b), indicating that both 
domains are required for the YTHDF2-mediated mRNA decay. An 
in vitro pull-down experiment further showed that purified C-YTHDF2 
is able to enrich m°A-containing mRNA from total mRNA (Extended 
Data Fig. 61). The spatial distribution of the SON mRNA relative to 
YTHDF?2 and N- and C-YTHDF2 truncates were examined by fluor- 
escence in situ hybridization (FISH) and fluorescence immunostaining 
in HeLa cells (Fig. 3c-e). The location of the SON mRNA showed a 
strong correlation with that of the full-length YTHDF2 (Fig. 3c) and 
C-YTHDF2 (Fig. 3e). In contrast, a much lower correlation was 
observed for the SON mRNA with N-YTHDF2 (Fig. 3d). In addition, 
the full-length YTHDF2 and N-YTHDF2 co-localized with DCPla, 
but to a much lesser extent for C-YTHDF2, thereby indicating the role 
of N-YTHDF2 in P-body localization. Furthermore, the overexpres- 
sion of C-YTHDF? led to a reduced co-localization of the SON mRNA 
with DCP 1a (Fig. 3e). 

In further support of this mechanism, N-YTHDF2 was fused with 
i peptide (N-YTHDF2-), which recognizes Box B RNA with a high 
affinity in a tether reporter assay”*°. Tethering N-YTHDF2-/ to 
F-luc-5BoxB (five Box B sequence was inserted into the 3’ UTR of 
the mRNA reporter) led to a significantly reduced mRNA level (Fig. 3f) 
and shortening (40%) of its lifetime compared with tethering controls 
of N-YTHDF2 or i alone (Extended Data Fig. 7a-e). The reporter 
mRNA bound by N-YTHDEF2-A possesses shorter poly(A) tail length 
in comparison with unbound portion, although a significant change of 
the deadenylation rate was not observed(Extended Data Fig. 7f-). 
Together with the observation that YTHDF2 co-localizes with both 
deadenylation and decapping enzyme complexes (Extended Data 
Fig. 6), we propose a model (Fig. 3g) that consists of: (1) C-YTHDF2 
selectively recognizes m°A-containing mRNA less engaged with trans- 
lation; (2) this binding of YTHDF2 to methylated mRNA happens in 
parallel or at a later stage of deadenylation; (3) N-YTHDF2 localizes 
the YTHDF2-m°A-mRNA complex to more specialized mRNA decay 
machineries (P bodies etc.) for committed degradation. 

Functional clustering of YTHDF2 targets versus non-targets revealed 
that the main functions of YTHDF2-mediated RNA processing are 
gene expression (molecular function) as well as cell death and survival 
(cellular function, Extended Data Fig. 8a—d). After 72h of YTHDF2 
knockdown, the viability of HeLa cells reduced by 50% (Extended Data 
Fig. 8e, f), indicating that the YTHDF2-mediated RNA processing 
could have biological significance. 

In summary, we present a transcriptome-wide identification of 
YTHDF2-RNA interaction and a mechanistic model for m°A function 


LETTER 


mRNA level 
F-luc-5BoxB F-luc 


@ = MN-YTHDF2 I C-YTHDF2 f 
0 100 200 300 400 500 aa 
| 


b 
P=0.002 


P=0.35 P=0.68 


4.25 P<0.001 


P = 0.008 PP =0.004 


P=0.011-—4 
0.60 


1.00 


t ° 
x (0.51 096 
& 0.40 a rao 
1o) 
a 30.50 
E 0.20 u 
0.25 
0.00 + 0.00 
v VY Vv yo a YF OY hl 
& SF § § SS KS 
So & CL K&S 
ee eS 


SON mANA&14 Depta 
0.60™ 70.12 
Flag-YTHDF2-C 


0.36 

SON mRNA <=>Depia SON mRNA Depta 
0.77™ 70.40 0.07™s 70.54 

Flag-YTHDF2 Flag-YTHDF2-N 


Cap A, 
Cap. —O—_______, 


3 


Translatable Pool 


Free ribosomal subunit 


© 60s 


Actively translating polysome 


Non-ribosome mRNPs: 
P body etc. 


Figure 3 | YTHDF2 affects SON mRNA localization in processing body (P- 
body). a, Schematic of the domain architecture (aa, amino acids) of YTHDF2, 
N terminus of YTHDF2 (N-YTHDF2, aa 1-389, blue) and C terminus of 
YTHDEF2 (C-YTHDF2, aa 390-end, red). b, Overexpression of full-length 
YTHDE2? led to reduced levels of m°A after 24h, whereas overexpression of 
N-YTHDF2 or C-YTHDF? increased the m°A/A ratio of the total mRNA. 

P values were determined using two-sided Student’s t-test for paired samples. 
Error bars, mean = s.d., n = 4 (two biological replicates X two technical 
replicates). c-e, Fluorescence in situ hybridization of SON mRNA and 
fluorescence immunostaining of DCP1a (P-body marker), Flag-tagged 
YTHDEF2 (c), Flag-tagged C-YTHDF2 (d), and Flag-tagged N- YTHDF2 

(e). Full-length YTHDF2 and C-YTHDF2 co-localize with SON mRNA 
(bearing m°A) and the full-length YTHDF2 significantly increases the P-body 
localization of SON mRNA compared to N-YTHDF2 and C-YTDF2. The 
numbers shown above figures are Pearson correlation coefficients of each 
channel pair with the scale of the magnified region (white frame) set as 

2 |um X 2 um. f, Tethering N-YTHDF2-/ to a mRNA reporter F-luc-5BoxB led 
to a ~40% reduction of the reporter mRNA level compared to tethering 
N-YTHDEF2 or ) alone (green) and controls without BoxB (F-luc, yellow). 

P values were determined using two-sided Student’s t-test for paired samples. 
Error bars, mean + s.d., n = 6 (F-luc-5BoxB) or 3 (F-luc). g, A proposed 
model of m°A-dependent mRNA degradation mediated through YTHDF2. 
The three states of mRNAs in cytoplasm are defined by their engagement with 
ribosome using the sedimentation coefficient range in sucrose gradient: >80S 
for actively translating polysome; 40S-80S for translatable pool; 20S—35S for 
non-ribosome mRNPs. 


mediated by this m°A-binding protein, as the first functional demon- 
stration ofa m°A reader protein. We show that YTHDF? alters the dis- 
tribution of the cytoplasmic states of several thousand m°A-containing 
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mRNA. This present work demonstrates that reversible m°A depos- 
ition could dynamically tune the stability and localization of the target 
RNAs through m°A ‘readers’. 


METHODS SUMMARY 


m°A profiling, PAR-CLIP and RIP experiments were conducted as previously 
reported'*”°”°. For ribosome profiling, RPF was obtained by micrococcal nuclease 
digestion followed by sucrose gradient (10-50%) separation. Complementary DNA 
libraries of RPF and mRNA input were constructed as previously described”. In 
RNA lifetime profiling, actinomycin D (5 pig ml") was added to stop transcription, 
and samples at 0, 3 and 6 h decay were collected. ERCC RNA spike-in control 
(Ambion) was added to each sample before the isolation of mRNA and library 
construction to correct the decrease of the whole mRNA population during RNA 
decay. All of the cDNA libraries were sequenced by using Hiseq 2000 (Illumina, 
single end, 100 bp) and at least two replicates were performed for each experiment 
(Extended Data Table 1). The deep sequencing data were mapped to Human 
genome version hg19 without any gaps and allowed for at most two mismatches. 
The PAR-CLIP binding sites were identified through kernel density estimation of 
T to C conversions. For RIP, transcripts that have more than twofold enrichment 
were identified as targets. For ribosome profiling and mRNA lifetime profiling, the 
average of the log,(siYTHDF2/siControl) values generated from two biological 
replicates were analysed and comparisons of independent replicates were sum- 
marized in Extended Data Fig. 9. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 


Plasmid construction and protein expression. Recombinant YTHDF1-3 were 
cloned from commercial cDNA clones (Open Biosystems) into vector pGEX-4T-1. 
The primers used for subcloning (from 5’ to 3’; F stands for forward primer; R 
stands for reverse primer) are listed: GST-YTHDF1-F, CGATCGAATTCATG 
TCGGCCACCAGCG; GST-YTHDF1-R, CCATACTCGAGTCATTGTTTGTITT 
CGACTCTGCC; GST-YTHDF2-F, CGTACGGATCCATGTCAGATTCCTACT 
TACCCAG; GST-YTHDF2-R, CGATGCTCGAGTCATTTCCCACGACCTTG 
ACG; GST-YTHDF3-F, CGTACGGATCCATGTCAGCCACTAGCGTG; GST- 
YTHDF3-R, CGTAGCTCGAGTCATTGTTTGTTTCTATTTCTCTCCCTAC. 

The resulting clones were transfected into the Escherichia coli strain BL21 and 
expression was induced at 16°C with 1mM IPTG for 20h. The pellet collected 
from 2 litres of bacteria culture was then lysed in 30 ml PBS-L solution (50 mM 
NaH,PO,, 150 mM NaCl, pH 7.2, 1mM PMSF, 1mM DTT, 1mM EDTA, 0.1% 
(v/v) Triton X-100) and sonicated for 10 min. After removing cell debris by cent- 
rifuge at 17,000g for 30 min, the supernatant were loaded to a GST superflow 
cartridge (Qiagen, 5ml) and gradiently eluted by using PBS-EW (50mM 
NaH,PO,, 150mM NaCl, pH 7.2,1mM DTT, 1mM EDTA) as buffer A and 
TNGT (50mM Tris, pH 8.0, 150mM NaCl, 50mM red, GSH, 0.05% Triton 
X-100) as buffer B. The crude products were further purified by gel-filtration 
chromatography in GF buffer (10 mM Tris, pH 7.5, 200 mM NaCl, 3mM DTT 
and 5% glycerol). The yield was around 1-2 mg per litre of bacterial culture. 

Flag-tagged YTHDF2 was cloned into vector pcDNA 3.0 (BamHI, Xhol, for- 
ward primer, CGTACGGATCCATGGATTACAAGGACGACGATGACAAGA 
TGTCGGCCAGCAGCG; reverse primer, CGATGCTCGAGTCATTTCCCACG 
ACCTTGACG). Flag-tagged YTHDF2 N-terminal domain was made by mutating 
E384 (GAA) toa stop codon (TAA) with a Stratagene QuikChange II site-directed 
mutagenesis kit (pcDNA-Flag-Y2N, forward primer, CTGGATCTACTCCTTCATAA 
CCCCACCCAGTGTTG; reverse primer, CAACACTGG GTGGGGTTATGAA 
GGAGTAGATCCAG). Flag-tagged YTHDF2 C-terminal domain was made by 
cloning amino acids from E384 to the end into vector pcDNA 3.0 (BamHI, Xhol, 
forward primer, CGTACGGATCCATGGATTACAAGGACGACGATGACAA 
GGAACCCCACCC AGTGTT,; reverse primer, CGATGCTCGAGTCATTTCCC 
ACGACCTTGACG). Plasmids with high purity for mammalian cell transfection 
were prepared with a Maxiprep kit (Qiagen). 

Tether reporter: pmirGlo Dual luciferase expression vector (Promega) was used 
to construct the tether reporter which contains firefly luciferase (F-luc) as the 
primary reporter and Renilla luciferase (R-luc) acting as a control reporter for 
normalization. F-luc-5BoxB mRNA reporter was obtained by inserting five Box B 
sequence (5BoxB) into the 3’ UTR of F-luc (SacI and Xhol, the resulting plasmid 
was named as pmirGlo-5BoxB;). The 5BoxB sequence” (see below) was PCR- 
amplified from PRL-5BoxB plasmid, which was provided by W. Filipowi (forward 
primer, CGATACGAGCTCTTCCCTAAGTCCAACTACCAAAG; reverse pri- 
mer, CTATGGCTCGAGATAATATCCTCGATAGGGCCCG; sequencing primer, 
GACGAGGTGCCTAAAGA)*!. 

The 5BoxB sequence: TTCCCTAAGTCCAACTACTAAACTGGGGATTCCT 
GGGCCCTGAAGAAGGGCCCCTCGACTAAGTCCAACTACTAAACTGGGC 
CCTGAAGAAGGGCCCATATAGGGCCCTGAAGAAGGGCCCTATCGAGG 
ATATTATCTCGACTAAGTCCAACTACTAAACTGGGCCCTGAAGAAGGG 
CCCATATAGGGCCCTGAAGAAGGGCCCTATCGAGGATATTATCTCGAG. 

To study the decay kinetics of F-luc-5BoxB, another reporter plasmid (pmirGlo- 
Ptight-5BoxB) was constructed by replacing the original human phosphoglycerate 
kinase promoter of F-luc with Ptight promoter (restriction sites: Apal and BglII). 
Ptight promoter was PCR amplified from pTRE-Tight vector (Clontech; forward 
primer, CGTACAGATCTCGAGTTTACTCCCTATCAGT; reverse primer, CTG 
TAGGGCCCT TCTTAATGTTTTTGGCATCTTCCATCTCCAGGCGATCTG 
ACG; sequencing primer, AGCGGTGCGTACAATTAAGG). The resulting plas- 
mid (pmirGlo-Ptight) was subjected to a second round of subcloning by inserting 
5BoxB into the 3’ UTR of F-luc (restriction sites: XbaI and SbflI) to generate 
pmirGlo-Ptight-5BoxB (forward primer, CGATACTCTAGATTCCCTAAGTCC 
AACTACCAAAC; reverse primer, CTATGGCCTGCAGGATAATATCCTCG 
ATAGGGCCC; sequencing primer, GACGAGGTGCCTAA AGA). 

Tether effecter: i peptide sequence (MDAQTRRRERRAEKQAQWKAAN) was 
fused to the C terminus of N-YTHDEF2 by subcloning N-YTHDF2 to pcDNA 3.0 
with forward primer containing Flag-tag sequence and reverse primer containing 
peptide sequence (pcDNA-Flag-Y2NA, BamHI, Xhol; forward primer, GATACGG 
ATCCATGGATTACAAGGACGACGATGACAAGATGTCGGCCAGCAGCC; 
reverse primer, TATGGCTCGAGTCAGTTTGCAGCTTTCCATTGAGCTTGT 
TTCTCAGCGCGACGCTCACGTCGTCGTGTTTGTGCGTCCATACCTGAA 
GGAGTAGATCCAGAACC). The A peptide control was designed with a Flag tag 
at N-terminal and a GGS spacer (pcDNA-Flag-A). The primer pair that contains 
Flag-tagged 4 peptide and sticky restriction enzyme sites (BamHI, XhoI) was 
annealed and directly ligated to digested pcDNA 3.0 (forward primer, GAT 
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CCATGGATTACAAGGACGACGATGACAAGGGTGGTAGCATGGACGCA 
CAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAA 
AGCTGCAAACTAAG; reverse primer, GAGTTAGTTTGCAGCTTTCCATTG 
AGCTTGTTTCTCAGCGCGACGCTCACGTCGTCGTGTTTGTGCGTCCATG 
CTACCACCCTTGTCATCGTCGTCCTTGTAATCCATG). 

EMSA (electrophoretic mobility shift assay/gel shift assay). The RNA probe was 
synthesized by a previously reported method with the sequence of 5'-AUGGGC 
CGUUCAUCUGCUAAAAGGXCUGCUUUUGGGGCUUGU-3’ (X = A or m®A). 
After the synthesis, the RNA probe was labelled in a reaction mixture of 2 ul RNA 
probe (1 1M), 5 pl 5 X T4 PNK buffer A (Fermentas), 1 11 T4 PNK (Fermentas), 
1pl [P]ATP and 41 pl RNase-free water (final RNA concentration 40 nM) at 
37°C for 1h. The mixture was then purified by RNase-free micro bio-spin col- 
umns with bio-gel P30 in Tris buffer (Bio-Rad 732-6250) to remove hot ATP and 
other small molecules. To the elute, 2.5 pl 20 X SSC (Promega) buffer was added. 
The mixture was heated to 65°C for 10 min to denature the RNA probe, and 
then slowly cooled down to room temperature. GST-YTHDF1, GST-YTHDF2 
and GST-YTHDF3 were diluted to concentration series of 200 nmol, 1 uM, 5 uM, 
20 LM and 100 uM (or other indicated concentrations) in binding buffer (10 mM 
HEPES, pH 8.0, 50mM KCl, 1mM EDTA, 0.05% Triton-X-100, 5% glycerol, 
10 ug ml? salmon DNA, 1 mM DTT and 40 U mI RNasin). Before loading to 
each well, 1 11 RNA probe (4nM final concentration) and 1 ull protein (20 nM, 
100 nM, 500 nM, 2 uM or 10 pM final concentration) were added and the solution 
was incubated on ice for 30 min. The entire 10 1] RNA-protein mixture was loaded 
to the gel (Novex 4~20% TBE gel) and run at 4 °C for 90 min at 90 V. Quantification 
of each band was carried out by using a storage phosphor screen (K-Screen; Fuji film) 
and Bio-Rad Molecular Imager FX in combination with Quantity One software 
(Bio-Rad). The Kg (dissociation constant) was calculated with nonlinear curve 
fitting (Function Hyperbl) of Origin 8 software with y = P, X x/(P, +x), where 
y is the ratio of [RNA-protein]/[free RNA]+[RNA-protein], x is the concentra- 
tion of the protein, P, is set to 1 and P, is Kg. 

Mammalian cell culture, siRNA knockdown and plasmid transfection. Human 
HeLa cell line used in this study was purchased from ATCC (CCL-2) and grown in 
DMEM (Gibco, 11965) media supplemented with 10% FBS and 1% 100 X Pen 
Strep (Gibco). HeLa Tet-off cell line was purchased from Clontech and grown in 
DMEM (Gibco) media supplemented with 10% FBS (Tet system approved, Clontech), 
1% 100 X Pen Strep (Gibco) and 200 pg ml! G418 (Clontech). AllStars negative 
control siRNA from Qiagen (1027281) was used as control siRNA in knockdown 
experiments. YTHDF2 siRNA was ordered from Qiagen as custom synthesis which 
targets 5’-AAGGACGTTCCCAATAGCCAA-3’ near the N terminus of CDS. 
MT-A70 siRNA was ordered from Qiagen: 5'-CGTCAGTATCTTGGGCAAGTT-3’. 
Transfection was achieved by using Lipofectamine RNAiMAX (Invitrogen) for 
siRNA, and Lipofectamine 2000 for single type of plasmid or Lipofectamine LTX 
Plus (Invitrogen) for co-transfection of two or multiple types of plasmids (tether- 
ing assay) following the manufacturer’s protocols. 

RNA isolation. mRNA isolation for LC-MS/MS: total RNA was isolated from 
wild-type or transiently transfected cells with TRIzol reagent (Invitrogen). mRNA 
was extracted using PolyATtract mRNA Isolation Systems IV (Promega) followed 
by further removal of contaminated rRNA by using RiboMinus Transcriptome 
Isolation Kit (Invitrogen). mRNA concentration was measured by NanoDrop. 
Total RNA isolation for RT-PCR: following the instruction of RNeasy kit (Qiagen) 
in addition to DNase I digestion step. Ethanol precipitation: to the RNA solution 
being purified or concentrated, 1/10 volume of 3 M NaOAc, pH 5.5, 1 pl glycogen 
(10 mg ml"') and 2.7 volume of 100% ethanol were added, stored at -80 °C for 1h 
to overnight, and then centrifuged at 15,000g for 15 min. After the supernatant was 
removed, the pellet was washed twice by using 1 ml 75% ethanol, and dissolved in 
the appropriate amount of RNase-free water as indicated. 

In vitro pull down. 0.8 ug mRNA (save 0.2 1g from the same sample as input) and 
YTHDF1, YTHDF2, YTHDF3 or C-YTHDF2 (final concentration 500 nM) were 
diluted into 200 yl IPP buffer (150 mM NaCl, 0.1% NP-40, 10 mM Tris, pH 7.4, 
40 U ml! RNase inhibitor, 0.5 mM DTT), and the solution was mixed with rota- 
tion at 4°C for 2h. For YTHDF1, YTHDF2, YTHDF3, 10 pl GST-affinity mag- 
netic beads (Pierce) were used for each sample after being washed four times with 
200 ul IPP buffer for each wash. For C-YTHDF2, 20] Dynabeads His-Tag 
Isolation & Pulldown beads (Invitrogen) were used after being washed four times 
with 200 pl IPP buffer for each wash. The beads were then re-suspended in 50 pl 
IPP buffer. The protein-RNA mixture was combined with GST or His6 beads and 
kept rotating for another 2 h at 4°C. The aqueous phase was collected, recovered 
by ethanol precipitation, dissolved in 15 il water, and saved as the flow-through. 
The beads were washed four times with 300 pl IPP buffer each time. 0.4 ml TRIzol 
reagent was added to the beads and further purified according to the manufac- 
turer’s instructions. The purified fraction was dissolved in 15 1] water, and saved as 
YTHDF-bound. LC-MS/MS was used to measure the level of m°A in each sample 
of input, flow-through and YTHDF-bound. 
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LC-MS/MS’*. 200-300 ng of mRNA was digested by nuclease P1 (2 U) in 25 ul of 
buffer containing 25 mM of NaCl, and 2.5 mM of ZnCl; at 37 °C for 2 h, followed 
by the addition of NH4sHCO; (1 M, 3 pl) and alkaline phosphatase (0.5 U). After an 
additional incubation at 37 °C for 2h, the sample was diluted to 50 ul and filtered 
(0.22 pm pore size, 4 mm diameter, Millipore), and 5 til of the solution was injected 
into LC-MS/MS. Nucleosides were separated by reverse phase ultra-performance 
liquid chromatography on a C18 column with on-line mass spectrometry detec- 
tion using an Agilent 6410 QQQ triple-quadrupole LC mass spectrometer in 
positive electrospray ionization mode. The nucleosides were quantified by using 
the nucleoside to base ion mass transitions of 282 to 150 (m°A), and 268 to 136 (A). 
Quantification was performed in comparison with the standard curve obtained 
from pure nucleoside standards running on the same batch of samples. The ratio of 
m°A to A was calculated based on the calibrated concentrations. 

m°A profiling. Total RNA was isolated from HeLa cells with TRIzol reagent. 
Poly(A)* RNA was further enriched from total RNA by using FastTrack MAG 
Maxi mRNA isolation kit (Invitrogen). In particularly, an additional DNase I 
digestion step was applied to all the samples to avoid DNA contamination. RNA 
fragmentation, m°A-seq, and library preparation were performed according to the 
previous protocol developed in ref. 14. The experiment was conducted in two 
biological replicates (Extended Data Table 1). 

RIP-seq. The procedure was adapted from the previous report”®. 60 million HeLa 
cells were collected (three 15-cm plates, after 24h transfection of Flag-tagged 
YTHDE2) by cell lifter (Corning Incorporated), pelleted by centrifuge for 5 min 
at 1,000g and washed once with cold PBS (6 ml). The cell pellet was re-suspended 
with 2 volumes of lysis buffer (150 mM KCl, 10 mM HEPES pH 7.6, 2mM EDTA, 
0.5% NP-40, 0.5mM DTT, 1:100 protease inhibitor cocktail, 400 U ml“! RNase 
inhibitor; one plate with ~200 ul cell pellet and ~400 ul lysis buffer), pipetted up 
and down several times, and then the mRNP lysate was incubated on ice for 5 min 
and shock-frozen at — 80 °C with liquid nitrogen. The mRNP lysate was thawed on 
ice and centrifuged at 15,000g for 15 min to clear the lysate. The lysate was further 
cleared by filtering through a 0.22 4m membrane syringe. 50 pil cell lysate was 
saved as input, mixed with 1 ml TRIzol. The anti-Flag M2 magnetic beads (Sigma, 
20 pl per ml lysate, ~30 pl to each sample) was washed with a 600 pl NT2 buffer 
(200 mM NaCl, 50 mM HEPES pH 7.6, 2 mM EDTA, 0.05% NP-40, 0.5 mM DTT, 
200 U mI! RNase inhibitor) four times and then re-suspended in 800 ll ice-cold 
NT2 buffer. Cell lysate was mixed with M2 beads; the tube was flicked several 
times to mix the contents and then rotated continuously at 4 °C for 4 h. The beads 
were collected, washed eight times with 1 ml ice-cold NT2 buffer. 5 packed beads 
volumes (~150 pl = 30 pil X 5) of elution solution which was 500 ng pl’ 3 X Flag 
peptide (Sigma) in NT2 buffer were added to each sample, and the mixture was 
rotated at 4°C for 2 h to elute. The supernatant was mixed with 1 ml TRIzol and 
saved as IP. RNA recovered from input was further subjected to mRNA purifica- 
tion by either Poly(A) selection (replicate 1, FastTrack MAG Micro mRNA isola- 
tion kit, invitrogen) or rRNA removal (replicate 2, RiboMinus Eukaryote Kit v2, 
Ambion). Input mRNA and IP with 150-200 ng RNA of each sample were used to 
generate the library using TruSeq stranded mRNA sample preparation kit (Illumina). 
PAR-CLIP. We followed the previously reported protocol” with the following 
modifications. Sample preparation: Five 15-cm plates of HeLa cells were seeded at 
Day 1 18:00. At Day 2 10:00, the HeLa cells were transfected with Flag-tagged 
YTHDF2 plasmid at 80% confluency. After six hours, the media was changed and 
200 [tM 4SU was added. At Day 3 10:00, the media was aspirated, and the cells were 
washed once with 5 ml ice-cold PBS for each plate. The plates were kept on ice, and 
the crosslink was carried out by 0.15J cm’ 7 Ultraviolet light. 2 ml PBS was added 
and the cells were collected by cell lifter. 

Library construction: the final recovered RNA sample was further cleaned by 
RNA Clean & Concentrator (Zymo Research) before library construction by Tru- 
seq small RNA sample preparation kit (Illumina). 

Mild enzyme digestion**: The first round of T1 digest was carried out under 
0.2 U pl! for 15 min instead of 1 U pl’ for 15 min. The second round of T1 digest 
was conducted under 10 U pl’ for 8 min instead of 50 U pl! for 15 min. 
Ribosome and polysome profiling. The procedure was adapted from the pre- 
vious report”. Eight 15-cm plates of HeLa cells were prepared for 48 h knockdown 
(siControl, siYTHDF2, four plates each). Before collection, cycloheximide (CHX) 
was added to the media at 100 pg ml! for 7 min. The media was removed, and the 
cells were collected by cell lifter with 5 ml cold PBS with CHX (100 pg ml'). The 
cell suspension was spun at 400g for 2 min and the cell pellet was washed once by 
5 ml PBS-CHX per plate. 1 ml lysis buffer (10 mM Tris, pH 7.4, 150mM KCI, 
5mM MgCh, 100 pg ml! CHX, 0.5% Triton-X-100, freshly add 1:100 protease 
inhibitor, 40 U ml"' SUPERasin) was added to suspend the cells and then kept on 
ice for 15min with occasional pipetting and rotating. After centrifugation at 
15,000g for 15 min, the supernatant (~1.2 ml) was collected and absorbance tested 
at 260 nm (150-200 A260 nm ml '). To the lysate, 8 tl DNase Turbo was added. 
The lysate was then split by the ratio of 1:4 (Portion I/Portion II). 41 Super 


RNasin was added to Portion I. 40 1] MNase buffer and 3 pl MNase (6,000 gel 
units, NEB) was added to Portion II. Both portions were kept at room temperature 
for 15 min, and then 8 pil SUPERasin was added to Portion II to stop the reaction. 
Portion I was saved and mixed with 1 ml TRIzol to purify input mRNA. Portion II 
was used for ribosome profiling. 

Ribosome profiling: a 10/50% w/v sucrose gradient was prepared in a lysis buffer 
without Triton-X-100. Portion II was loaded onto the sucrose gradient and cen- 
trifuged at 4°C for 4h at 27,500 r.p.m. (Beckman, rotor SW28). The sample was 
then fractioned and analysed by Gradient Station (BioCamp) equipped with 
ECONO Uv monitor (BioRad) and fraction collector (FC203B, Gilson). The frac- 
tions corresponding to 80S monosome (not 40S or 60S) were collected, combined, 
and mixed with an equal volume of TRIzol to purify the RNA. The RNA pellet was 
dissolved in 30 il water, mixed with 30 1l2 < TBE-urea loading buffer (Invitrogen), 
and separated on a 10% TBE-urea gel. A 21-nt and a 42-nt ssRNA oligo were used 
as size markers, and the gel band between 21 and 42 nt was cut. The gel was passed 
through a needle hole to break the gel, and 600 pl extraction buffer (300 mM 
NaOAc, pH 5.5, 1mM EDTA, 0.1 U ml! RN: asin) was added. The gel slurry was 
heated at 65°C for 10 min with shaking, and then filtered through 1 ml Qiagen 
filter. RNA was concentrated by ethanol precipitation and finally dissolved in 10 ul 
of RNase-free water. 

Input mRNA: the input RNA was first purified by TRIzol and the input mRNA 
was then separated by PolyATract. The resulting mRNA was concentrated by 
ethanol precipitation and dissolved in 10 pl of RNase-free water. The mRNA 
was fragmented by RNA fragmentation kit (Ambion). The reaction was diluted 
to 20 ul and cleaned up by micro Bio-Spin 30 column (cut-off: 20 bp; exchange 
buffer to Tris). 

Library construction: the end structures of the RNA fragments of ribosome 
profiling and mRNA input were repaired by T4 PNK: (1) 3’ de-phosphorylation: 
RNA (20 ul) was mixed with 2.5 pl PNK buffer and 1 p11 T4 PNK, and kept at 37 °C 
for 1h; (2) 5’-phosphorylation: to the reaction mixture, 1 pl 10 mM ATP and 1 pl 
extra T4 PNK were added, and the mixture was kept at 37 °C for 30 min. The RNA 
was purified by 500 pl TRIzol reagent, and finally dissolved in 10 pl water. The 
library was constructed by Tru-seq small RNA sample preparation kit (Illumina). 
The sequencing data obtained from ribosome profiling (portion II) were denoted 
as ribosome-protected fragments and that from RNA input (portion I) as mRNA 
input. Translation efficiency was defined as the ratio of ribosome-protected frag- 
ments and mRNA input, which reflected the relative occupancy of 80S ribosome 
per mRNA species. 

Polysome profiling: sample preparation and sucrose gradient were the same as 

those of the ribosome profiling procedure except eliminating MNase digestion. The 
fractions resulting from sucrose gradient were used for western blotting or pooled 
to isolate total RNA for RT-PCR and mRNA for LC-MS/MS test of m°A/A ratio. 
RNA-seq for mRNA lifetime. Two 10-cm plates of HeLa cells were transfected 
with YTHDF2 siRNA or control siRNA at 30% confluency. After 6 h, each 10-cm 
plate was re-seeded into three 6-cm plates, and each plate was controlled to afford 
the same amount of cells. After 48 h, actinomycin D was added to 5 wg ml! at6h,3 
h, and 0 h before trypsinization collection. The total RNA was purified by RNeasy 
kit (Qiagen). Before construction of the library with Tru-seq mRNA sample pre- 
paration kit (Illumina), ERCC RNA spike-in control (Ambion) was added to each 
sample (0.1 pl per sample). Two biological replicates were generated: (1) in rep- 
licate 1, RNA spike-in control was added proportional to cell numbers; (2) in 
replicate 2, RNA spike-in control was added proportional to total RNA. Although 
data obtained from the two sets showed systematic shift, they led to consistent 
conclusion that YTHDF2 knockdown leads to prolonged lifetime of its RNA targets 
(Extended Data Fig. 9). 
Data analysis of seq-data. General pre-processing of reads: All samples were 
sequenced by illumine Hiseq2000 with single end 100-bp read length. For libraries 
that generated from small RNA (PAR-CLIP and ribosome profiling), the adapters 
was trimmed by using FASTX-Toolkit**. The deep sequencing data were mapped 
to Human genome version hg19 by Tophat version 2.0° without any gaps and 
allowed for at most two mismatches. RIP and Ribosome profiling were analysed by 
DESeq* to generate RPKM (reads per kilobase, per million reads). mRNA lifetime 
data were analysed by Cuffdiff version 2.0° to calculate RPKM. 

Data analysis for each experiment: (1) for m°A profiling, the m°A-enriched 
regions in each m°A-immunoprecipitation sample were extracted by using the 
model-based analysis of ChIP-seq (MACS) peak-calling algorithm*’, with the 
corresponding m°A-Input sample serving as the input control. For each library, 
the enriched peaks with P< 107° were used for further analysis; (2) for RIP, 
enrichment fold was calculated as log,(IP/input); (3) PAR-CLIP data were ana- 
lysed by PARalyzerv1.1 with default settings*; (4) for ribosome profiling, only 
genes with RPKM >1 were used for analysis and the change fold was calculated as 
log,(siYTHDF2/siControl); (5) for mRNA lifetime profiling: RKPM were con- 
verted to attomole by linear-fitting of the RNA spike-in. 
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The degradation rate of RNA k was estimated by 


A 
log, (2) =—kt 


where ¢ is transcription inhibition time (h), A; and Ap represent mRNA quantity 
(attomole) at time t and time 0. Two k values were calculated: time 3 h versus time 
Oh, and time 6h versus time 0h. The final lifetime was calculated by using the 
average of kz}, and key. 


Integrative data analysis and statistics: PAR-CLIP targets were defined as repro- 
ducible gene targets among three biological replicates (3,251). RIP targets (2,528) 
were genes with log,(IP/input) > 1. The overlap of PAR-CLIP and RIP targets 
were defined as CLIP + IP targets (1,277). And non-targets (3,905) should meet the 
conditions: (1) complementary set of PAR-CLIP targets; (2) RIP enrichment fold 
<0. For the comparison of PAR-CLIP and m°A peaks, at least 1 bp overlap was 
applied as the criteria of overlap peaks. Two biological replicates were conducted 
for ribosome profiling and mRNA lifetime profiling, respectively. And genes with 
sufficient expression level (RPKM >1) were subjected to further analysis. The 
change fold that used in the main text is the average of the two log»(siY THDF2/ 
siControl) values. Nonparametric Mann-Whitney U-test (Wilcoxon rank-sum 
test, two sided, significance level = 0.05) was applied in ribosome profiling data 
analysis as previous reported’. For the analysis of cell viability (Extended Data 
Fig. 8e), RPF of ribosome profiling data were analysed by Cuffidff version 2.0 for 
differential expression test, and the genes that differentially expressed (P < 0.05) 
were subjected to Ingenuity Pathway Analysis (IPA, Ingenuity System). RPF was 
chosen since it may better reflect the translation status of each gene. 

Data accession: all the raw data and processed files have been deposited in 

the Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo). m°A profiling 
data are accessible under GSE46705 (GSM1135030 and GSM1135031 are input 
samples whereas GSM1135032 and GSM1135033 are immunoprecipitation sam- 
ples). All other data are accessible under GSE49339. 
RT-PCR. Real-time PCR (RT-PCR) was performed to assess the relative abund- 
ance of mRNA. All RNA templates used for RT-PCR were pre-treated with on 
column DNase I in the purification step. The RT-PCR primers were designed to 
span exon-exon junctions in order to further eliminate the amplification of geno- 
mic DNA and unspliced mRNA. When the examined gene had more than one 
isoform, only exon-exon junctions shared by all isoforms were selected to evaluate 
the overall expression of that gene. RT-PCR was performed by using Platinum 
one-step kit (Invitrogen) with 200-400 ng total RNA template or 10-20 ng mRNA 
template. HPRT1 was used as an internal control because: (1) HPRT1 mRNA did 
not have m°A peak from m°A profiling data; (2) HPRT1 mRNA was not bound by 
YTHDF2 from the PAR-CLIP and RIP sequencing data; (3) HPRT1 showed 
relative invariant expression upon YTHDE2 knockdown from the RNA-seq data; 
(4) HPRTI was a house-keeping gene. 

YTHDF2: TAGCCAACTGCGACACATTC; CACGACCTTGACGTTCCTTT. 

SON: TGACAGATTTGGATAAGGCTCA; GCTCCTCCTGACTTTTTAGCAA. 

CREBBP: CTCAGCTGTGACCTCATGGA; AGGTCGTAGTCCTCGCACAC. 

PLAC2: AAGCGCTACCACATCAAGGT; CCTCCAACCCAGACTACCTG. 

LDLR: GCTACCCCTCGAGACAGATG; CACTGTCCGAAGCCTGTTCT. 

HPRT1: TGACACTGGCAAAACAATGCA; GGTCCTTTTCACCAGCAAGCT. 

F-luc or F-luc-5BoxB: CACCTTCGTGACTTCCCATT; TGACTGAATCGGAC 
ACAAGC. 

R-luc: GTAACGCTGCCTCCAGCTAC; CCAAGCGGTGAGGTACTTGT. 

A combination of knockdown/overexpression/RIP/RT-PCR experiments was 
conducted to evaluable the occupancy change of YTHDF2 on its RNA targets after 
MT-A70 (METTL3) knockdown (Extended Data Fig. 5). Two 15-cm plates of 
HeLa cells were transfected with siControl or siMETTL3 siRNA. After 10 h, the 
cells were re-seeded. After 14 h, the cells were further transfected with Flag-tagged 
YTHDF2 plasmid, and collected after another 24h (in total 48h knockdown of 
METTL3, 24h over-expression of Flag-YTHDEF2). Anti-Flag beads were used to 
separate YTHDF2-bound portion (IP) from unbound portion (flow-through) as 
described in the RIP section. 

Fluorescence microscopy. Fluorescent immunostaining: the protocol of ref. 26 
was followed. The cells were grown in an 8-well chamber (Lab-Tek). After treat- 
ment indicated in each experiment, the cells were washed once in PBS and then 
fixed in 4% paraformaldehyde in PBST (PBS with 0.05% Tween-20; prepared by 
mixing paraformaldehyde with PBST, heat at 60°C until clear, pH~7.5) at room 
temperature for 15 min under rotation. The fixing solution was removed, and 
—20°C chilled methanol was immediately added to each chamber and incubated 
for 10 min at room temperature. The cells were rinsed once in PBS and incubated 
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with blocking solution (10% FBS with PBST) for 1 h at room temperature under 
rotation. After that, the blocking solution was replaced with primary antibody 
(diluted by fold indicated in Antibodies section in blocking solution) and incu- 
bated for 1 h at room temperature (or overnight at 4°C). After being washed 4 
times with PBST (300 pl, 5-10 min for each wash), secondary antibody (1:300 
dilution in PBST) was added to the mixture and incubated at room temperature 
for 1 h. After washing 4 times with PBST (300 ul, 5-10 min for each wash), anti- 
fade reagent (slowfade, Invitrogen) was added to mount the slides. 

FISH in conjugation with fluorescent immunostaining: Stellaris FISH probe 
with Quasar 570 was used according to the manufacturer’s instructions. After 
the washing step, the sample preparation proceeded to the blocking step of the 
previous paragraph in the presence of 40 U ml of RNase inhibitor. Secondary 
antibodies were Alexa 488 and Alexa 647 conjugates. 

Image capture and analysis: the images were captured by Leica SP5 II STED- 

CW super-resolution laser scanning confocal microscope, analysed by ImageJ. 
The colocalization was quantified by JAcoP (ImageJ plug-in) and the Pearson 
coefficients in main text Fig. 3 were gained under Costes’ automatic threshold”. 
Protein co-immunoprecipitation. HeLa cells expressing Flag-tagged YTHDF2, 
N-YTHDF2, C-YTHDF2 or pcDNA3.0 blank vector were collected by cell lifter 
(three 15-cm plates for each), and pelleted by centrifuge at 400g for 5 min. The cell 
pellet was resuspended with 2 volumes of lysis buffer (the same as the one used in 
RIP), and incubated on ice for 10 min. To remove the cell debris, the lysate solution 
was centrifuged at 15,000 g for 15 min at 4 °C, and the resulting supernatant was 
passed through a 0.22-um membrane syringe filter. While 50 il of cell lysate was 
saved as Input, the rest was incubated with the anti-Flag M2 magnetic beads 
(Sigma) in ice-cold NT2 buffer (the same as the one used in RIP) for 4 h at 
4°C. Afterwards, the beads was subject to extensive wash with 8 X 1 ml portions 
of ice-cold NT2 buffer, followed by incubation with the elution solution containing 
3 X Flag peptide (0.5 mg ml’ in NT2 buffer, Sigma) at 4 °C for another 2 h. The 
eluted samples, saved as IP, were analysed by western blotting. For IP samples, 
each lane was loaded with 2 ug IP portion; and the input lane were loaded with 
10 pg Input portion which corresponded to ~1% of overall input. 
Tether assay. Basic setting: 100 ng reporter plasmid (pmirGlo or pmirGlo-5BoxB) 
and 500 ng effecter plasmid (pcDNA-Flag-A, pcDNA-Flag-Y2NA, or pcDNA- 
Flag-Y2N) were used to transfect the HeLa cells in each well of six-well plate at 
60~80% confluency. After 6 h, each well was re-seeded into 96-well plate (1:20) 
and 12-well plate (1:2). After 24 h, the cells in 96-well plate were assayed by Dual- 
Glo Luciferase Assay Systems (Promega). Firefly luciferase (F-luc) activity was 
normalized by Renilla luciferase (R-luc) to evaluate the translation of reporter. 
And samples in 12-well plate were processed to extract total RNA (DNase I 
digested), followed by RT-PCR quantification. The amount of F-luc mRNA was 
also normalized by that of R-luc mRNA. 

RNA immunoprecipitation: Two 15-cm plates of HeLa cells were transfected 
with 1 1g pmirGlo-5BoxB reporter and 5 ug pcDNA-Flag-Y2NA effecter plasmids 
for each plate. After 24 h, the samples were processed as described in RIP section. 
The recovered RNA from Input, IP and FT portions were used in poly(A) tail 
assay. 

RNA decay: 200 ng reporter plasmid (pmirGlo-Ptight-5BoxB) and 1 yg effecter 
plasmid (pcDNA-Flag-A, pcDNA-Flag-Y2NA, or pcDNA-Flag-Y2N) were used 
for each 6 cm plate to transfect the HeLa Tet-off cell line (Clontech) in the presence 
of 400ng doxycycline (Dox, Clontech). The transcription of F-lucSBoxB was 
under repression at this stage. After 18 h, the cells in each 6-cm plate were washed 
twice with PBS, trypsinized, and washed twice with Dox-free media, then split to 
four equal portions and re-seeded to 12-well plate in Dox-free media. After 4h 
pulse transcription of F-luc5BoxB, Dox was added to 400 ng in each well. The first 
time point (t = 0 h) was taken as after 20 min*’, then 2h, 4h and 6h. Total RNA 
extracted from each sample were used for RT-PCR analysis and Poly(A) tail length 
assay. 

Poly(A) tail length assay. Poly(A) tail length assay was performed by using 
Poly(A) Tail-Length Assay kit (Affymetrix) as previously reported’. The protocol 
of the manufacture (Extended Data Fig. 7f-l) was followed, with 30 cycles of two- 
step PCR at the last step, and then visualized on 10% non-denaturing TBE gel. The 
forward primer of F-luc-5BoxB is 5'-CCGCTGAGCAATAACTAGCA-3’, and 
the gene-specific reverse primer is 5'-TGCAATTGTTGTTGTTAACTTGTTT-3’. 
The forward primer of CREBBP mRNA is 5'-GTCTTGGGCAATCCAGATGT-3’, 
and the gene-specific reverse primer is 5’- TTTGAATCCAAGTAGTTTTACCATC -3’. 
Antibodies. The antibodies used in this study were listed below in the format 
of name (application; catalogue; supplier; dilution fold): Rabbit anti-YTHDF1 
(Western; ab99080; Abcam; 1,000). Rabbit anti-YTHDF3 (Western; ab103328; 
Abcam; 1,000). Mouse anti-Flag HRP conjugate (Western; A5892; Sigma; 5000). 
Rabbit anti- MT-A70 (Western; 15073- 1-AP; Proteintech Group; 3000). Rabbit 
anti-FTO (Western; 5325-1; Epitomics; 10,000). Goat anti-GAPDH HRP conjug- 
ate (Western; A00192; GeneScript; 15,000). Rabbit anti-DCP2 (Western; Ab28658; 
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Abcam; 1,000). Rabbit anti-m°A (m°A-seq; 202003; Synaptic Systems; 4 |g per seq). 
Rat anti-Flag (IF; 637304; Biolegend; 300). Mouse anti-DCP 1a (IF; WH0055802M6; 
Sigma; 300). Mouse anti-GW182 (4B6) (IF; ab70522; Abcam; 100). Rabbit anti- 
DDX6 (IF; a300-461A; Bethyl Lab; 250). Anti-HuR (IF; WH0001994M2; Sigma; 
50). Goat anti-eIF3 (N-20) (IF; sc-16377; Santa Cruz Biotech; 100). Mouse anti- 
CNOT7 (IF; sc-101009; Santa Cruz Biotech; 100). Goat anti-PAN2 (C-20) (IF; sc- 
82110; Santa Cruz Biotech.; 100). Anti-PARN (IF; ab27778; Abcam; 100). Donkey 
anti-rat Alexa 488 (IF; A21208; Molecular Probes; 300). Goat anti-rabbit Alexa 647 
(IF; A21446; Molecular Probes; 300). Goat anti-mouse Alexa 647 (IF; A21236; 
Molecular Probes; 300). Donkey anti-goat Alexa 647 (IF; A21447; Molecular 
Probes; 300). 
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Extended Data Figure 1 | YTH domain family members are m°A-specific 
RNA binding proteins. a, Western blot showing YTHDF1 and YTHDF3 
pulled down with an m°A-containing RNA probe. *Thiol-substituted 
phosphodiester bonds were used to prevent enzymatic cleavage. b, LC-MS/MS 
showing that m°A was enriched in GST-YTHDF1- or GST-YTHDF3-bound 
mRNA while depleted in the flow-through portion. ¢, d, Gel-shift assay 


GST-YTHDF3 
Y=m6A 


10000 
10000 


167341149 
3234119 - a” 


——- £m 


measuring the dissociation constant (Kg, nM, indicated at the upper left corner 
of the gel) of GST-tagged YTH domain family proteins (c, YTHDF2; 

d, YTHDF1 and YTHDF3) with methylated and unmethylated RNA probes. 
4nmol RNA probe was labelled with **P and the protein concentration ranged 
from 20nM to 5M. 
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Extended Data Figure 2 | Features and comparisons of YTHDF2 PAR-CLIP 
data with RIP and m°A-seq. a, Left, PAR_CLIP gel image showing 32p_ 
labelled RNA-~YTHDEF2 complex; right, western blotting of HeLa cell lysate 
with overexpression of Flag-tagged YTHDF2 (10 1g per lane). Upper band was 
detected by anti-Flag antibody; lower band was detected by anti-GAPDH 
antibody. b, Overlap of transcripts identified by PAR-CLIP and RIP-seq of 
YTHDE2. c, d, YTHDF2 binding motif identified by MEME with top 1,000 
scored PAR-CLIP peaks under different motif searching parameters. c, With 
motif length restricted to 5-10 bp, P = 1.1 X 10 73, 183 sites were found under 
this motif. d, The motif length was restricted to 5-12 bp. The motif with lowest 
P value was shown in main text as Fig. 1c, this motif showed the second lowest P 
value, P= 5.1 X 10 *4, 104 sites were found. e, With 7-12 bp, P= 7.5 X 10 ”, 


231 sites were found under this motif. f, Distribution of PAR-CLIP peaks across 
the length of mRNA. Each region of 5’ UTR, CDS, and 3’ UTR were binned 
into 50 segments, and the percentage of PAR-CLIP peaks that fall within each 
bin was determined. g, Overlap of YTHDF2 PAR-CLIP peaks with m°A peaks 
in different sub-transcript regions. Over 70% PAR-CLIP peaks in 5’ UTR, CDS, 
stop codon, and 3’ UTR regions overlap with m°A peaks (at least 1-bp overlap). 
In contrast, only 20%~30% of PAR-CLIP peaks in transcription starting sites 
(TSS) and intergenic regions coincide with m°A peaks. h, Enrichment of 
YTHDF2 PAR-CLIP peaks in long exons. The length distribution of exons that 
contain YTHDF2 PAR-CLIP peaks (red) shifts to larger size compared with the 
length distribution of all exons in the human genome (black). 
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Extended Data Figure 3 | Effects of YTHDF2 knockdown and summary of 
the sequencing data. a, The YTHDF2 knockdown efficiency is about 80% as 
detected by RT-PCR (error bars, mean + s.d., n = 3, biological replicates) and 
RNA-seq. Although at current stage we could not identify a reliable antibody 
for YTHDF2, ribosome-profiling of YTHDF2 did indicate that the translation 
level of YTHDF2 decreased by 80% after siRNA knockdown. RT-PCR results 
were normalized to that of GAPDH as an internal control. RNA-seq and 
ribosome profiling results were calculated by actual RPKM. b, YTHDF2 
knockdown led to decreased translation efficiency of its targets due to the 
accumulation of non-translating mRNA. Translation efficiency is calculated as 
the ratio of ribosome-protected fragments and mRNA input. P value was 


calculated by using Mann-Whitney U-test (two-tailed, significance 

level = 0.05). c, Multiple pairwise comparisons (Kruskal-Wallis test) by using 
the Steel-Dwass-Critchlow-Fligner procedure (two-tailed, significance 

level = 0.05). d, The regional effect of the YTHDF2-binding site is not 
significant. Cumulative distribution showing mRNA lifetime log,-fold changes 
(A) between si- YTHDF2 and si-control for non-targets and CLIP-IP common 
targets with major CLIP peak at 5’ UTR, CDS, 3’ UTR, intron, and non-coding 
RNA. Except for intron, other regions show similar fold changes (also see 
Extended Data Fig. 3c). e, The m°A methyltransferase (MT-A70) and 
demethylase (FTO) remain unchanged with YTHDEF2 knockdown. 
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Extended Data Figure 4 | Validation of representative YTHDF2 RNA 
targets. a-d, Examples of transcripts harbouring m°A peaks and YTHDF2 
PAR-CLIP peaks: SON (CDS, a), CREBBP (3' UTR, b), LDLR (3' UTR, 

c), PLAC2 (non-coding RNA, d). Coverage of m°A immunoprecipitation and 
input fragments are indicated in red and blue, respectively. YTHDF2 PAR- 
CLIP peaks are highlighted in green. Black lines signify CDS borders. 

e-n, relative RNA level quantified by gene-specific RT-PCR, and error bars 


shown in these figure panels are mean + s.d.,n = 


6 (two biological 


replicates X three technical replicates). e, Enrichment fold of SON, CREBBP 


mRNA input control. f, Relative changes of SON, CREBBP mRNA, and PLAC2 
RNA in siYTHDF2 sample versus siControl, and overexpression of YTHDF2 
versus overexpression of C-YTHDF2. g-k, Lifetimes of SON, CREBBP mRNA 
and PLAC2 RNA under siYTHDF2 versus siControl. I-n, YTHDF2 
knockdown altered the cytoplasmic distribution of its mRNA targets. The SON 
(1) and CREBBP (m) mRNA levels decreased in the non-ribosome mRNP 
portion but increased in the 40S-80S portion under siYTHDF2 compared to 
siControl. However, they showed different changes in the polysome portion. 
RPL30 (n) is not a target of YTHDF2 and did not show an increase in the 40S- 


mRNA, and PLAC2 RNA in YTHDF2-RNA coimmunoprecipitation versus 
RNA-protein input control, and in m°A in vitro immunoprecipitation versus 


80S portion. 
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Extended Data Figure 5 | Knockdown of METTL3 (MT-A70) led to 
decreased binding of YTHDF2 to its targets and increased stability of its 
target RNAs similar to that of YTHDF2 knockdown. a, Western blotting 
showing that the knockdown efficiency of siMETTL3 at 48 h was ~80%. 
b-g, Relative RNA level quantified by gene-specific RT-PCR, and error bars 
shown in these figure panels are mean = s.d., n = 6 (two biological 
replicates X three technical replicates). b, Percentages of YTHDF2 targets 
(SON, CREBBP, LDLR) in YTHDF2-bound portion versus unbound portion 


Hours after transcription inhibition 


Hours after transcription inhibition 


decreased significantly after METTL3 knockdown for 48 h. After 24h 
transfection of METTL3 siRNA, HeLa cells were transfected with Flag-tagged 
YTHDF2, and cells were collected after another 24h. Anti-Flag beads were used 
to separate YTHDF2-bound portion (IP) from unbound portion (flow- 
through). Each transcript was quantified by RT-PCR. ¢, Relative changes of 
SON, CREBBP and LDLR mRNA in siMETTL3 sample versus siControl. 


d-g, Lifetimes of SON, CREBBP, and LDLR mRNA under siMETTL3 versus 
siControl. 
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Extended Data Figure 6 | Co-localization of YTHDF2 with protein markers 
of P bodies, stress granules, and deadenylation complexes. a—h, Fluorescence 
immunostaining of Flag-tagged YTHDF2 (green, anti-Flag, Alexa 488) and 
other protein markers (DCP1a and GW 182 for P bodies and elIF3 for stress 
granule, DDX6 (also known as RCK/p54) and HuR for both, CNOT7, PAN2, 
and PARN for deadenylation complex; magenta of Alexa 647 is the colour for 
the marker, green + magenta = white for the co-localization spot). The scale 
of the magnified region (while frame) is 1.8 1m X 1.8 um. i, Co-localization 
between YTHDF2 and different protein markers were characterized by 
Pearson’s coefficient, for each pair, n = 5~7. YTHDF2 seems to have better 
co-localization with P bodies than stress granules. It also seems to co-localize 
best with CNOT7 (also known as CAF1 or POP2) which is a subunit of the 


CCR4-NOT deadenylation complex. j, Western blotting results showing that 
immunoprecipitation (IP) of Flag-tagged full length YTHDF2 and N-YTHDF2 
(N-terminal domain) also pulled down the P-body marker DCP2, but not with 
mock control or C-YTHDF2 (the C-terminal domain). For IP samples, each 
lane was loaded with 2 1g IP portion; and the input lane was loaded with 

10 pg input portion which corresponded to ~1% of overall input). 

k, Comparison of P/Q/N (highlighted) rich regions of YTHDF1-3 with other 
aggregation-prone proteins. l, C-YTHDF?2 is capable of selective binding of 
m°A-containing RNA. LC-MS/MS showing that m°A-containing RNA was 
enriched in the His6-tagged C-YTHDF2-bound mRNA while reduced in the 
flow-through portion. Error bars shown in the figure are mean + s.d.,n = 4 
(two biological replicates X two technical replicates). 
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Extended Data Figure 7 | Tether assay of the N-terminal domain of 
YTHDEF2. a, Structural presentation of the two domains of YTHDF2. 

b, Scheme of the reporter assay: the RNA reporter vector encodes firefly 
luciferase (F-luc) as the primary reporter and Renilla luciferase (R-luc) on the 
same plasmids acting as transfection control for normalization. Five Box B 
RNA elements were inserted at the 3’ UTR of F-luc as positive tether reporter 
(noted as F-luc-5BoxB); the effecter was a fusion of N-YTHDF2 and A peptide 
which recognizes Box B with high affinity. c, The F-luc luciferase activity 
(protein translation) for N-YTHDF2-A was reduced by ~20% compared to 
that of N-YTHDF2 and / controls. Error bars shown in the panel are mean 
values + s.d. from n = 8 (biological replicates). d, e, The reporter mRNA 
lifetime was significantly reduced (~40%) when bound by N-YTHDF2-A as 
compared to the controls of N-YTHDF2 and i. Doxycycline (Dox, 400 ng pl’) 
was used to inhibit transcription of the reporter. 18 h post transfection of 
reporter and effecters, Dox was removed to allow a pulse transcription of 
F-luc-5BoxB for 4 h. Then Dox was added back and the samples were collected 
at indicated time point. The amounts of F-luc-5BoxB were determined by 


RT-PCR, normalized to R-luc, then for each time series, samples at t = 0 h were 
set as 100%. Error bars shown in the panel are mean = s.d., n = 6 (two 
biological replicates X three technical replicates). f, Scheme of poly(A) tail 
length assay. g, h, Tethering N-YTHDF2 to the reporter mRNA does not 
significant trigger deadenylation of the reporter. The PCR products of reporter 
poly(A) tail were visualized in 10% TBE gel stain (g) and no significant 
difference of the deadenylation rate was observed (h). i-l, Shorter poly(A) tail 
lengths were observed in the YTHDF2-bound fraction for the N-YHTDF2- 
tethered reporter RNA (i and j) as well as the native target RNA CREBBP 
(kand]). Tether reporter F-luc-5BoxB and Flag-tagged YTHDF2-N-A (i) or full 
length Flag-tagged YTHDF2 (k) were expressed in HeLa cells, and subjected 
to immunoprecipitation with anti-Flag beads. RNA recovered from input, 

IP and flow-through were further processed and the final PCR products for 
F-luc-5BoxB (i) or CREBBP (k) were visualized in 10% TBE gel. j and 1, each 
lane were re-plotted against base pair, after Jog fitting of relative gel mobility 
with base pairs. 
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Extended Data Figure 8 | Cellular function of YTHDEF2. a, b, The top 
molecular function of YTHDF2 targets is “Gene Expression and RNA 
Transcription”, and the top cellular function is “Cell Death and Survival”. 
Ingenuity Pathway Analysis of function category of YTHDF2 targets and 
non-targets revealed that the two gene groups are heterogeneous in their 
functional composition. (*top two functions for YTHDF2 targets and 

**top two functions for YTHDF2 non-targets.). c, d, Pie charts of molecular 
types of differentially expressed YTHDF2 targets (c) versus non-targets 

(d) upon YTHDF2 knockdown. Differentially expressed genes (P value <0.05) 
caused by YTHDF2 knockdown were grouped to YTHDF2 targets (796 gene) 
and non-targets (1554) based on their presence or absence in YTHDF2 
PAR-CLIP binding sites, and subject to Ingenuity Pathway Analysis 


48h 72h 96h 
Hours after transfection of SIRNA 


(the category “other” was not shown). The results show that the group of 
YTHDF2 targets is transcription regulators whereas that of non-targets is 
enzyme, indicating that m°A may significantly affect gene expression via tuning 
mRNA stabilities of transcription factors through YTHDF2. e, f, YTHDF2 
knockdown led to reduced cell viability. The IPA analysis of ribosome profiling 
data of YTHDF2 knockdown (48 h) versus control predicts decreased cell 
viability (e). Ribosome profiling data was chosen since it may better reflect the 
translation status. MTT assay provided experimental evidence of reduced cell 
viability upon YTHDF2 knockdown. P values that were calculated from 
Student’s t-test were 0.036, 4.7 X 10°, and 9.4 x 10-4, at 48h, 72h and 96h 
respectively (f). Error bars shown in the figure are mean + s.d., n = 10 
(biological replicates). 
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Extended Data Figure 9 | Comparisons of sequencing data with replicates. 
a, Overlap of three biological replicates (rep1-rep3) for PAR-CLIP. Numbers 
showing the sum of genes identified in each sample. b, Correlation of 
enrichment fold as log,(IP/input) between two technical RIP replicates. In rep1 
the input mRNA was purified by poly(dT) beads, whereas in rep2 the input 
RNA was processed by rRNA removal. c—e, Box plot showing consistent results 
from two biological replicates that were conducted for ribosome profiling and 
mRNA lifetime profiling, respectively. For mRNA lifetime profiling, rep1 was 
normalized by spike-in control that was proportional to cell numbers, whereas 
rep2 was normalized by spike-in that was proportional to total RNA 


log ;9(polyA/RPKM) 


siControl mRNA input 
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concentrations. Despite the technical variations, YTHDF2 knockdown resulted 
in significant lifetime increase of its targets. (T, 1,277 CLIP+RIP targets; 

NT, 3,905 non-targets; box, the first and third quartiles; notch, the median; 
dot in the box: the data average; whisker, 1.5 X standard deviation; cross, 

the 1 and 99 percentiles; short line, the maximum and minimum; P values 
were calculated by Mann-Whitney U-test, two-tailed, significant level = 0.05). 
f-h, Correlation of RPKM between technical mRNA input samples prepared by 
poly(A) selection (x axis) and by rRNA removal (y axis), which are comparable 
to the variations between biological replicates that prepared by the same mRNA 
selection method. 
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Extended Data Table 1 | Summary of the sequencing samples 


Experiment Sample / replicates Mapped Reads 
rep1 22398274 
PAR-CLIP rep2 16469620 
rep3 18340002 
rep1-input 33037301 
m°A profiling rep1-IP 48299395 
rep2-input 28182497 
rep2-IP 46985896 
IP-rep1 13956595 
ap IP-rep2 6658433 
input polyA 13995632 
input ribominus 11201129 
rep1-SiControl-RPF 6153002 
rep1-SiControl-input-polyA 24693835 
rep1-SiYTHDF2-RPF 4396160 
rep1-SiYTHDF2-input-polyA 23772645 
Ribosomepratitng rep2-SiControl-RPF 10302755 
rep2-SiControl-input-polyA 11276363 
rep2-SiControl-input-ribominus 7336313 
rep2-SiIYTHDF2-RPF 9830313 
rep2-SiYTHDF2-input-polyA 9525286 
rep2-SiYTHDF2-input-ribominus 14030008 
rep1-SiControl-TI-Oh 19703956 
rep1-SiControl-TI-3h 17066177 
rep1-SiControl-Tl-6h 25105141 
rep1-SiYTHDF2-TI-0h 23291878 
rep1-SiYTHDF2-TI-3h 18905279 
mRNA lifetime profiting rep1-SiIYTHDF2-TI-6h 27471654 
rep2-SiControl-Tl-Oh 9709614 
rep2-SiControl-TI-3h 10252359 
rep2-SiControl-Tl-6h 13315823 
rep2-SiYTHDF2-TI-0h 9996766 
rep2-SiYTHDF2-TI-3h 11123149 
rep2-SiYTHDF2-TI-6h 7543209 
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focus less on research p.123 


WORKFORCE NIH calls for modelling to 
mitigate worrying trends p.123 
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COMMUNICATION 


Spontaneous 


scientists 


Some think that researchers can improve their 
communication by flexing their improvisation skills. 


BY RACHEL BERNSTEIN 


circle of scientists is gazing skyward, as 
A watching a ball fly through the air as 
they play an animated game of catch. 
But there is no ball — and this game is serious 
work. It is part of an exercise to help 12 scien- 
tists at the University of Connecticut (UConn) 
Health Center in Farmington to boost their 
communication skills. 
These scientists are engaged in improvisation, 
a spontaneous, reactive interaction mode more 
traditionally seen in comedy performances. 


Improvisation games and communication 
both require attention to others and the forging 
of personal connections, says Raquell Holmes, 
a cell biologist by training who now spends 
much of her time running workshops for sci- 
entific conferences and research institutions 
through her company, improvscience, based 
in Boston, Massachusetts. With that in mind, 
she has adapted some traditional improvisation 
exercises, and imaginary catch is one of them. 
In this game, participants use eye contact to 
indicate where the ‘ball’ is being thrown, and 
use and read body language to communicate its 


size and weight so that the recipient can catch it 
correctly. The skills learned in these games can 
be directly transferred to scientists’ work pur- 
suits, says Cibele Falkenberg, a computational- 
biology postdoc at UConn Health Center who 
has participated in some of the workshops. 
For instance, she says, “with communication, 
you have to make sure you have a connection 
before you pass the message’, which applies to 
any audience, including co-workers, funding 
agencies and the public. Convinced of the ben- 
efits, improvscience is one of a number of US 
programmes using improvisation to help hone 
these skills (see ‘Workshops and events’). 


COMMUNICATIVE COLLABORATION 

Researchers sometimes fall short in their 
communication with each other, despite the 
importance of collaboration. Holmes thinks 
that improvisation offers a powerful tool to 
address this problem — through, for example, 
the ‘yes, and’ rule. This basic tenet of improvi- 
sation dictates that participants must say ‘yes 
to any verbal or physical cues that they receive 
and build on them, rather than trying to shut 
down a direction that makes them uncomfort- 
able. The rule is important in a research con- 
text, in which a ‘no, but’ stance often dominates 
— such as when discussing a colleague's results 
or critiquing a paper in ajournal-club meeting. 

From a scientific perspective, this critical 
approach may be appropriate and necessary. 
But taken too far, Holmes says, it can create a 
negative group dynamic and make some peo- 
ple hesitant to share ideas for fear of ridicule. 
And that, in turn, could slow research progress. 

To illustrate this problem, in one of her games 
Holmes asks participants to get into pairs and 
work together to plan a party. First, members 
of each pair can respond to each other’s state- 
ments only by starting with ‘no, but’; they then 
repeat the exercise using the ‘yes, and’ rule. The 
‘no, but approach made it “very difficult to have 
a meaningful conversation’, says Max Staller, a 
systems-biology graduate student at Harvard 
University in Cambridge, Massachusetts, who 
has participated in several improvscience 
workshops. In stark contrast, the ‘yes, and’ rule 
worked so well in planning the fictitious party 
that he now applies it to his research. 

“I try to consciously think about, is there a 
way to say ‘yes, and,” Staller says. “I make a 
point in journal club of talking about what's 
positive about the paper; sometimes we focus 
too much on the shortcomings, and take for 
granted the successes.” 

Holmes also uses games such as ‘mirrors, > 
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> in which partners have to match each oth- 
er’s body movements as closely as possible, and 
‘emotional bus; in which participants portray 
an emotion as they board an imaginary bus, 
and the rest of the ‘passengers’ must read the 
emotion and act it out themselves. The point 
is to train people to be tuned in and respon- 
sive to others around them. “It looks like we’re 
just playing, but what we're practising is how 
to accept the idea that your colleague is giving 
you something,” explains Falkenberg. 


AUDIENCE CONNECTION 

In some cases, improvisation came to science 
through the inspired efforts of people who had 
a penchant for drama. Patricia Ryan Madson, 
an emeritus lecturer in theatre and perfor- 
mance studies at Stanford University in Palo 
Alto, California, is no scientist. But when she 
began teaching improvisational theatre classes 
to undergraduates in 1991, she found that 
many of her students were scientists and engi- 
neers. It got her thinking about how research- 
ers, too, could benefit from the confidence that 
practising improvisation inspires and become 
more comfortable, she says, being “agreeable 
and helpful and communicative, and making 
mistakes in a gentle way” that is “not de rigueur 
in the science world”. She found a partner in 
engineering professor Rolf Faste, and in the 
late 1990s the two launched an improvisation 
course that is still offered today. 

About 15 years after Madson's initial course, 
actor and long-time science aficionado Alan 
Alda came to the idea independently. After 
hosting a show called Scientific American Fron- 
tiers from 1993 to 2005, Alda knew he wanted 
to continue working in science communica- 
tion. He remembered how his early training in 
improvisational theatre had boosted his com- 
munication skills, and thought it might be val- 
uable for scientists. In January 2008, he tested 
the idea with a group of engineering students 
from the University of Southern California in 
Los Angeles by asking them to explain their 
research before and after playing improvisa- 
tion games. “The difference was startling,’ he 
recalls. This convinced him that the approach 
was worth pursuing. 

Alda now implements the idea with the Alan 
Alda Center for Communicating Science at 


Stony Brook University in New York, which 
offers an Improvisation for Scientists course 
for the university’s graduate students and is 
working to share the curriculum by develop- 
ing an affiliate network. The first institution to 
sign up, Dartmouth College in Hanover, New 
Hampshire, began an improvisation course 
for scientists in the autumn. The Alan Alda 
Center also provides workshops for students 
and faculty members at various conferences 
and science institutions. 

Whereas Holmes’ approach focuses primar- 
ily on facilitating teamwork, Alda emphasizes 
communicating with 
non-experts, whether 
interacting with the 
media, policy-mak- 
ers or the public. But 
the tools are largely 
the same: encourag- 


ing participants to be 
attuned to others and 
adapt their behaviour 

accordingly. 
In one Alda Center 
“With ae activity, one member 
communication, — ofapair speaks about 
youhavetomake a4 general topic and 
sure you have the other person tries 
aconnection to anticipate what the 
beforeyoupass first will say, so that 


the message.” 
Cibele Falkenberg 


the two end up say- 
ing the same thing 
at the same time. “It 
sounds impossible,” says Colin West, a Stony 
Brook physics graduate who has taken the 
Alda Center course, “but when you get into a 
rhythm, you can anticipate a lot based on body 
language, and the next thing you know you're 
chanting in unison.” 

Researchers who are well practised in such 
an exercise can use their newly honed recep- 
tiveness to help them respond to their audi- 
ences. “When youre reading the signals,” says 
Alda, “you have instantaneous feedback about 
whether they're understanding what youre say- 
ing. You can adjust what you say because you're 
accustomed to adjusting” Alda also uses exer- 
cises to practise this skill in a scientific context, 
including one in which a participant presents 
his or her research to a ‘revolving door’ of 


IMPROVISATION RESOURCES 


Workshops and events 


Improvisation resources are still relatively 
rare. Both improvscience (improvscience. 
org) and the Alan Alda Center 
(centerforcommunicatingscience.org) work 
with researchers and institutions to set up 
workshops a few hours to a few days long. 
Interested researchers can also attend 
the ‘Improvisation for Scientists: Making a 


Human Connection’ session on 16 February 
2014 in Chicago, Illinois, featuring the Alda 
Center, improvscience and Katie Watson, 
who specializes in medical humanities and 
bioethics at Northwestern University in 
Evanston, Illinois, and is a faculty member 
of renowned Chicago-based improvisation 
group The Second City. B.B. 
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The Alan Alda Center in New York uses improvisation exercises to boost science communication. 


audiences. In this game, the instructor assigns 
the audience an identity, such as a high-school 
class or an academic tenure committee, with- 
out telling the speaker. The audience mem- 
bers behave in keeping with their identity, and 
the speaker has to interpret their cues to tailor 
his presentation to them — until the instruc- 
tor assigns the audience a new identity and 
the speaker has to adjust seamlessly. 

These improvisation exercises helped 
Philip Fernandes, a graduate student in astro- 
physics at Dartmouth who recently took the 
Alda Center course. “I realized that when I 
give a talk, Imemorize it completely because 
I'm terrified I’m going to go blank, so it turns 
into this robotic reciting,’ he says. Fernandes 
began watching audiences and reading body 
language. “If] see that I'm losing them, it’s my 
job to regain their interest.” 

But it is not enough to simply be respon- 
sive to the audience. Presenters also need to 
give the audience something that will help 
them to forge a connection with the speaker 
and stay engaged. Alda teaches that the best 
hook is to include one’s personal perspec- 
tive, regardless of the audience or venue, so 
that listeners can relate — although some 
researchers may not be comfortable with 
this approach at first because “there’s this 
idea that emotion doesn't belong in science’, 
Fernandes notes. After taking the course 
and seeing how it improved her communi- 
cation skills, Dartmouth ecology graduate 
student Jessica Trout-Haney was convinced 
that Alda’s approach had merit. “Telling a 
personal story has made giving talks much 
more authentic, and fun,” she says. Initially, 
she had concerns that inserting subjective 
personal details would signify her science 
was not objective. “But if we don’t connect 
with the audience, we're doing a disservice to 
the science.” Improvisation helps participants 
to become more comfortable sharing a side 
of themselves that they may otherwise hide. 


In some exercises, participants must trust 
the first idea that comes to them — and that 
thought is instinctive and personal. 

Students are videoed giving a short 
research presentation before and after the 
course to show them their progress. Krithika 
Venkataraman, a graduate student in molec- 
ular and cellular biology at Stony Brook, says 
that she saw dramatic results. “I was more 
enthusiastic, more relaxed when I was up on 
the stage, and I hada smile on my face, which 
I didn't have before,” she says. “I was able to 
face the audience and be energetic and enthu- 
siastic about my science.” 


DIVE IN 

Trying improvisation for the first time can be 
nerve-racking, says Christine Urbanowicz, 
a graduate student in ecology at Dartmouth 
who took the Alda course. Despite her initial 
misgivings, she found it rewarding. When the 
group played imaginary catch during the first 
meeting, “I kept telling people, ‘don’t pass me 
the ball, don't pass me the ball} even though 
there wasnt a ball’, she recalls. But now that 
she has completed the course, her confidence 
as a speaker has increased because she knows 
that she can handle unexpected situations. 
“Improv allows you to trust yourself enough 
to know that you'll be able to figure out where 
youre going with your presentation without 
having it memorized; she says. 

In fact, Urbanowicz has found the experi- 
ence so helpful that she is spearheading an 
improvisation group for graduate students 
at Dartmouth so that she and her classmates 
can continue to practise and share their skills. 

Trout- Haney, for her part, suggests that 
anyone who is unsure about whether improv- 
isation is useful or right for them should “just 


2 


take a deep breath and say, ‘yes, and”. = 


Rachel Bernstein is a science writer based 
in San Francisco. 
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PATENTS 
Leaving academia 


Scientists who patent their discoveries 
seem more likely to leave academia than 
those who do not, says a study published on 
5 December (B. Balsmeier and M. Pellens 
Econ. Lett. http://doi.org/qhv; 2013). The 
authors examined data for 1996-2005 
froma survey of 263 academic researchers 
in Belgium created by the Organisation 
for Economic Co-operation and 
Development, and compared responses 
about career paths with publication and 
patent records. They found that for each 
additional patent, up to a total of two, the 
scientist was approximately a third more 
likely to leave academia. The authors 
speculate that patenting reflects an interest 
in commercializing results, which is better 
rewarded outside academia. 


BIOMEDICAL RESEARCH 
Physician-scientists 

The number of US physician-scientists 
who conduct biomedical research as 

their primary profession has declined in 
the past 30 years, according to a report 

by the Federation of American Societies 
for Experimental Biology (FASEB) in 
Bethesda, Maryland. Physician Scientists: 
Assessing the Workforce finds that from 
1982 to 2011, the proportion with medical 
degrees or MD-PhDs doing research fell 
from 3.6% to 1.6%. It attributes the decline 
to factors such as longer training periods 
and rising debt. But MD-PhD holders have 
an edge in a tight funding environment 
because of their skill set, says report 
co-author Howard Garrison, FASEB’s 
deputy executive director for policy. 


WORKFORCE 
NIH seeks models 


In the wake of its 2012 report on the 
biomedical workforce (see Nature 492, 167; 
2012), the US National Institutes of Health 
(NIH) is again seeking proposals for 
computational models of that workforce, 
with the aim of tracking and mitigating 
long-term trends that threaten its size and 
diversity. Such trends include the rising 
number of biomedical-PhD holders who 
seek jobs outside academia and the lower 
numbers of female scientists who stay in 
academia after PhDs. Michael Sesma, a 
programme director for the NIH’s National 
Institute of General Medical Sciences, says 
that the agency needs models of dynamics, 
such as how people make career decisions. 
It will use them to develop grant and other 
programmes to address problem areas. 
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YOU ARE NOT THE FIRST 
MINION TO DISAPPOINT ME 


It’sjust so difficult to get the staff these days. 


BY IAN CREASEY 


ou have let me down. Your task was 
Y simple! You were supposed to dis- 
tract Captain Nebulon and lure him 
into the Shadow Zone, where our phantoms 
could have toyed with him for years. Now he 
has slipped your grasp, and he reaches ever 
closer to the Neutronic Modulator. 

Youare not the first minion to disappoint 
me. I’m sure you recall what happened to 
your predecessor — you oversaw his fate 
yourself. We always need fresh brains for 
the Monster Pit. 

How could you be so careless? Don’t you 
realize the importance of our scheme? We 
must obtain the Neutronic Modulator so 
that we can unlock the wormhole, solve the 
self-erasing paradox, defeat the Masters of 
Entropy, and thereby put the governance of 
the galaxy on a firmer foundation. 

I expected so much more from you. 
Because my previous subordinates had been 
consistently inept, I employed special meas- 
ures in your case. I decided that as other peo- 
ple were invariably disappointing, the only 
person I could rely upon was myself. And so 
I created you from my own flesh and blood. 

Well, flesh. I have very little blood left, 
after replacing it with athanatic serum. 
Cancer cells are immortal, so I turned myself 
into a giant tumour. Eternity is worth a few 
side effects. My teratogenic outgrowths are 
mostly non-functional, rarely causing much 
disturbance outside the nightmares of my 
enemies. And although I depend upon a 
constant infusion of fresh serum, I’m care- 
ful to maintain back-up systems in case of 
any interruption to my supply. 

I ramble, I know. It is the only way I can 
hear an intelligent word spoken in the depths 
of my lair. Certainly I hear little wisdom from 
you. Even now, you merely cower in front of 
me, with a vacant expression on your face. 
I would like to see a little more ambition. 
When I say “You are not the first minion 
to disappoint me’, I almost wish you would 
interrupt and say, “But I shall be the last!” 

I’ve told you before how I overthrew my 

predecessor. When- 


> NATURE.COM ever he doubted my 
Follow Futures: loyalty, he used to lec- 
© @NatureFutures ture me on the perils 
Ei go.nature.com/mtoodm © of leadership: “This 


spiky titanium throne is not as comfortable 
as it looks. Being the Hyper- Dimensional 
Overlord is an endless series of frustrations. 
Inadequate minions are not the only source 
of disappointment, I can assure you.” 

Yet he appreciated my eagerness to con- 
coct new plots rather than merely follow 
directions. When I deposed him, he admired 
the efficiency of my coup. I know, because 
his wraith still haunts this throne and offers 
me advice. 

He always said that you lacked the drive 
to bea truly cunning henchman. Right now, 
he is urging me to dispose of you. Perhaps 
I shall, after I've finished explaining all the 
ways in which you have disappointed me. 

Let’s see... what else is there? You have 
chosen an incompetent underling, again. 
Your latest deputy is not as good at slink- 
ing undetected as he believes. Slinking is an 
important skill, one that the younger genera- 
tion neglects. 

You have barely spent any time in the 
Monster Pit, not even to ride the Chrome 
Leopards or ingratiate yourself with the 
Brain Worms. You must feed the Zaptors 
with your own hands, or you can hardly 
expect them to snap and snarl when you 
wish to impress a hostage with their ferocity. 

You — 

Wait! There is a disturbance in the 
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dungeons. Someone has breached the 
perimeter of my back-up vault. Is it Captain 
Nebulon already? I thought he was busy 
questing for the Neutronic Modulator. We 
shall deal with him soon enough, when I 
have finished reprimanding you. As you 
seem unable to execute my schemes, per- 
haps you would like to suggest your own 
plan for dealing with him? 

Speak up! Don’t just slump with your jaw 
gaping. The appropriate time to gape your 
jaw is when you're terrorizing a victim by 
displaying your metallic fangs. And you 
haven't even installed metallic fangs yet. 
That's yet another sign of your — 

Ouch! What’s happening? The fresh 
serum is burning me. Aaargh! What have 
you done, you fool? 

Oh, I see. You've projected a hologram. 
Your image has been receiving my rebuke, 
while you silently sneaked behind me and 
poisoned my supply of athanatic serum. It’s 
about time! I’m glad to see you finally show- 
ing some initiative. 

Uuuuurck. Blarvle. Kkkksss. 

Those are not my last words: I would 
have uttered something more eloquent. You 
have succeeded in killing my body, but your 
deputy has failed to destroy my back-up sys- 
tems. He blundered around the dungeons 
without penetrating the inner vault. Inside 
the necrocomputer, my brain patterns per- 
sist. Their output is routed to a panel built 
into this titanium throne. 

Step forward! Switch off the hologram, 
and reveal yourself. Remove my carcass: its 
ugliness is offensive. I will permit you to kick 
it, if you like, but only once. Seat yourself 
upon the throne, and enjoy your ascent. I 
look forward to my new advisory role. 

You have already learned the frustrations of 
relying on others to perform crucial assign- 
ments. Your first duty as the new Hyper- 
Dimensional Overlord will be to rebuke your 
incompetent underling for failing to imple- 
ment his part of your scheme. Bring him in 
here. Then repeat after me: “You have let me 
down. Your task was simple! You are not the 
first minion to disappoint me...” = 
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